This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the apphcant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



Copy tor the tieciea uttice (tu/uo) 



PATENT COOPERATION TREA 



TV/ 



PCT 

NOTIFICATION OF THE RECORDING 
OF A CHANGE 

(PCT Rule 92bis.1 and 
Administrative Instructions, Section 422) 


To: 

DE CLERCO, Ann 

De Clercq, Brants & Partners cv 

E. Gevaertdreef 10a 

B-9830 Sint-Martens-Latem 

BELGIQUE 


Date of mailing (day/month/year) 

07 November 2000 (07.1 1 .00) 


Applicanf s or agenf s file reference 
29314/341 58A 


IMPORTANT NOTIFICATION 


International application No. 
PCT/IB99/01958 


International filing date (day/month/year) 
09 November 1999 (09.11.99) 



1. The following indications appeared on record concerning: 
I [ the applicant the inventor 



the agent I I the common representative 

State of Residence 



Name and Address 

DE CLERCQ, Ann 

Ann De Clercq & Co B.V.B.A. 

Brandstraat 100 

B-9830 Sint-Martens-Latem 

Belgium 



State of Nationality 



Telephone No. 

+ 32 (0)9 280 23 40 



Facsimile No. 

+32 (0) 9 280 23 45 



Teleprinter No. 



2. The International Bureau hereby notifies the applicant that the following change has been recorded concerning: 

[ I the nationality |_J the residence 



□ 



the person 



□ 



the name 



the address 



Name and Address 

DE CLERCQ, Ann 

De Clercq, Brants & Partners cv 

E. Gevaertdreef 10a 

B-9830 Sint-Martens-Latem 

Belgium 



State of Nationality 



State of Residence 



Telephone No. 

+32 9 280 23 40 



Facsimile No. 

+ 32 9 280 23 45 



Teleprinter No. 



3. Further observations, if necessary: 



4. A copy of this notification has been sent to: 
I X| the receiving Office 
I I the International Searching Authority 



[ I the designated Offices concerned 
pX] the elected Offices concerned 
I I other: 





Authorized officer 


The International Bureau of WlPO 


Maria Victoria CORTIELLO 


34, chemin des Colombettes 


121 1 Geneva 20, SwiUerland 




Facsimile No.: (41-22) 740.14.35 


Telephone No.: (41-22) 338.83,38 


Form PCT/IB/306 (March 1994) 


003639439 



PCT/IB99/01958 

F :NT COOPERATION TREA 



From the INTERNATIONAL BUREAU 



PCT 


To: 


NOTIFICATION OF ELECTION 


Assistant Connnnissioner for Patents 




Uniteu oiaicS rdlcru allU 1 lauciiiaii^. 


(PCT Rule 61.2) 


Office 




Box PCT 




Washington, D.C.20231 




ETATS-UNIS D'AMERIQUE 


Date of mailing (day/month/year) 


in its capacity as elected Office 


01 August 2000 (01.08.00) 




International application No. 


Applicant's or agent's file reference 


PCT/iDyy/uiyoo 


C?^'^14/34158A 


International filing date (day/month/year) 


Priority date (day/month/year) 


09 November 1999 (09.11.99) 


09 November 1998 (09.11.98) 


Applicant 




ZABEAU, Marc et a! 




1, The designated Office is hereby notified of its election made: 


[3(] in the demand filed with the International Preliminary Examining Authority on: 


05 June 2000 (05.06.00) 


1 1 in a notice effecting later election filed with the International Bureau on: 


2. The election was 




I I was not 




made before the expiration of 19 months from the priority date or, where Rule 32 applies, within the time limit under 


Rule 32.2(b). 






Authorized officer 


The International Buroau of WIPO 




34, chemin des Colombettes 


Olivia RANAiVOJAONA 


1211 Geneva 20, Switzerland 




Facsimile No.: (41-22) 740.14.35 


Telephone No.: (41-22)338.83.38 


Form PCT/IB/331 (July 1992) 


IB9901958 



RATENT COOPERATION TREATY 



From the INTERNATIONAL SEARCHING AUTHORITY 



To: 

DE CLERCQ, Ann 

Attn. Clough, David W. 

Ann De Clercq & Co. B.V.B.A. 

Brandstraat 100 

B-9830 Sint-Martens-Latem 

on flTIIM 
DCLulUPl 


NOTIFICATION OF TRANSf^lTTAL OF 
THE INTERNATIONAL SEARCH REPORT 
OR THE DECLARATION 

(PCT Rule 44.1) 


Date of maaing 

(day/month/year) 14/06/2000 


Applicanf s or agenf s file reference 
29314/34158A 


FOR FURTHER ACTION See paragraphs 1 and 4 below 


International application No. 

PCT/IB 99/01958 


International tiling date 
(day/monthfyear) j y j 


Appficant 

METHEXIS N.V. et al. 



fxl "nie applicant is hereby notified that the International Search Report has been established and is transmitted herewith. 
Rllrtg of amendments and statement under Article 19: 

The applicant Is entitled. If he so wishes, to amend the claims of the international Application (see Rule 46): 

When? The time limit for filing such amendments is normally 2 months from the date of transmittal of the 
Intemationai Search Report; however, for more details, see the notes on the accompanying sheet 



Where? DIrectty to the 



Intematiorial Bureau of WlPO 
34, chemin des Cok3mt>ette3 
1211 Geneva 20, Switzerland 
Fascimile 1^.: (41-22) 740.14.35 



For more detailed Inetructlons, see the notes on the accompanying sheet 

Z I I The applicant is hereby notified tftat no Intemationai Search Report will be established and that the declaration under 
■ — I Article 17(2)(a) to that effect is transmitted herewith. 

3. With regard to the protest against payment of (an) additional fee(s) under Rule 40.2, tfra applicant Is notified that 

□ the protest together wHh the decision thereon has been transmitted to the international Bureau together with the 
applicanf s request to forward the texts of both the protest and the decision thereon to the designated Offices. 

I I no decision has been made yet on the protest; the applicant will be notified as soon as a decision is mads. 

4. Furttter actton(s): The applicant Is reminded of tfie following: 

Shortly after 18 months from the priority data, the intemationai application wtD be published by the International Bureau. 
If the applicant wishes to avoid or postpone publication, a notice of withdrawal of the intemationai application, or of the 
priortty claim, must reach the International Bureau as provided In Rules WbisA and 90bi!s.3, respectively, before the 
completion of the technical preparations for intemationai put)lication. 

Within 19 months from the priority date, a demand for intemationai preliminary examination must be filed if the applicant 
wishes to postpone ttie entry into the national phase until 30 months from the priority date (in some Offices even later). 

Within 20 months from the priority date, the applicant must perform the prescribed acts for entry into the national phase 
before all designated Offices which have not been lectedlnth demand r in a later election within 19 months from the 
prtorfty date r could not be elected because th y are not bound by Chapter II. 





Nam and mailing address f the Intemationai Searching Auttiorfty 
<v European Patent Office, P.B. 5818 Patentiaan 2 
iCft NL-2280 HV Rljswijk 
. Tel. (+31-70) 340-2040. Tx. 31 651 epo nl. 
Fax: (+31-70) 340-3016 


Authorized officer 

N1na Vercio 



Form PCT/ISA/220 (July 1998) 



^jjpES TO FORM PCT/ISA/220 (continu 



The letter must indicate the differences between the claims as filed and the daims as amended. It must, in 
particular, inc^ate, in connection with each claim appearing in the intemationaJ application (it being understood 
that identical indications conceming several claims may be grouped), whether 

(0 the claim is unchanged; 

(ii) the claim is cancelled; 

Ciii) the claim is new; 

(iv) the claim replaces one or more daims as filed; 

(v) the claim is the result of the division of a claim as filed. 



The following oxatnples Illustrate the manner in which amendments must be exptalnad In the 
accompanying letter: 

1 . [Where originally there were 48 claims and after amendment of some claims there are 51 ]: 
"Oaims 1 to 29. 31 . 32. 34. 35, 37 to 48 replaced by amended claims bearing Ihe same numbers; 
claims 30, 33 and 36 unchartged; new claims 49 to 51 added.* 

2. [Where originally there were 1 5 claims and after amendment of all daims there are 11 ): 
"Claims 1 to 1 5 replaced by amended daims 1 to 1 1 

3. [Where originally there were 14 claims and the amendments consist in cancelling some claims and in adding 
new claims]: 

"Claims 1 to 6 and 14 unchanged; claims 7 to 1 3 eanoelled; new daims IS, 16 and 17 added.* or 
'Claims 7 to 13 cancelled; new claims 15. 16 and 17 added; all other daims unchanged.* 

4. [Where various kinds of amendments are made]: 

'Claims 1 -1 0 unchanged; daims 1 1 to 1 3, 1 8 and 19 eanoelled; claims 1 4, 1 5 and 16 replaced by amended 
daim 1 4; claim 1 7 subdivided into amended daims 15.16 and 1 7; new daims 20 and 21 added.* 



"Statement under article 19(1)" (Rule 46.4) 

The amendments may be accompanied by a statement explaining the amendments and indicating any impact 
that such amendments might have on the description and the drawings (which cannot be amended under 
Artide19(1)). 

The statement will be published with the intemationaJ application and the amended daims. 
It must be In the language In which the Intematlonai apppllcation Is to be published. 

It must tae tarief, not exceeding 500 words if in English or if translated into English. 

It should not be confused with and does not replace the tetter indicating the differences twtween the claims 
as filed ar>d as amended. It must be filed on a separate sheet and must fc>e identified as such by a heading, 
preferably by using the words 'Statemerrt under Article 19(1).* 

It may r>ot contain any disparaging comments on the international search report or the relevance of citations 
contained in that report. Reference to citations, relevant to a given claim, contained in the intemationaJ search 
report may be made only in connection with an amendment of that claim. 



Consequence If a demand for International prdlmlnary examination has already been filed 

If, at the time of filing any amendments ur>der Article 1 9, a demand for international prdiminary examination 
has already been submitted, the applicant must preferat>ly, at the same tima of filing the amendments with the 
Intemational Bureau, also file a copy of such amendments with the International Preliminary Examining 
Aifthority (see Rule 62.2(a), first sentence). 



Consequence with regard to translation of the International application for entry Into the national phase 

The applicant's attention is drawn to the fact that, where upon entry into the national phase, a translation of the 
claims as amended under Artide 19 may have to be furnished to the design at ed/eleded Offices, instead of, or 
in addition to, the translation of the daims as filed. 

For further details on the requirements of each designated/elected Office, see Volume II of the PCT Applicant's 
Guide. 




IMotes to Form PCT/IS A/220 (aecond sheet) (January 1994) 
BNSDOCID: <XSISA2aONOENP4JL> 



PATENT COOPERATION TREATY . 

# PCT ♦ 



INTERNATIONAL SEARCH REPORT 



(PCT Article 18 and Rules 43 and 44) 


Applicant's or agenfs file ref rence 

29314/34158A 


FOR FURTHER ^ Notlficatfon of TransmitlaJ of International Search Report 
ACTION ^^^^ PCT/ISA/220) as weU as. wtiere applicable, Item 5 below. 


International application No. 


International filing date (day/month/year) 


(Eaillest) Priority Date (Oay/monlh/year) 


PCT/ IB 99/01958 


09/11/1999 


09/11/1998 


Applicant 






METHEXIS N,V. et al . 







This International Search Report has been prepared by this International Secuching Authority and is transmitted to the applicant 
according to Article 18. A copy is being transmitted to tt)e International Bureau. 

This International Search Report consists of a total of 4 sheets. 

|X| It is also acoompanied by a copy of each prior art document cttad in this report 



2. 
3. 



Basis of the report 

a. With regard to the language, the international search was carried out on tf» t>asis of the international application In the 
language in wfiich it was filed, unless otfwrwise indicated under this item. 

I I the international search was canted out on the basis of a translation of the intemational aoplication furnished to this 
Authority (Rule 23.1(b)). ^ 

b. With regard to any nucleotide and/or amino add sequence disclosed in the International application, the international search 
was carried out on the tasAs of tfie sequence listing : 

|X| contained In the international application in written form. 

pT] filed together with tfie International application In computer peadat)le form. 

n fumlshedsubsequently to this Authorfty in written form. 

I I fumished subsequently to this Authority in computer readt)le form. 

I I the statement that the sut>sequentty fumished wrttten sequence Osting does not go t)eyond the cfisdosure In the 
intemational application as filed has been fumistted. 

I I *he statement that the infonnation recorded in computer readable form is identical to the written sequence listing has been 
fumished 

I I Certain claims were found unsearchable (See Box 1). 
I I Unity of Invention Is lacking (see Box II). 



4. With regard to the title, 

fX| the text is approved as submitted by the applicant 

I I the text has been established by this Authority to read as follows: 



With regard to the abstract, 

Pn the text Is approved as submitted by tf)e applicant. 

|— I the text has been established, according to Rule 38.2(b), by this Authority as it appears in Box 111. The appOcant may, 
" — ' within one month from the data of mailing of this Intematiorial search report submit comments to this Auth rity. 

The figure of the drawings to be published with the abstract is Rgure N . 



□ as suggested by the applicant. [X| Non ofth figures. 

I I because th applicant failed to suggest a figure. 

I I because this figure better characterizes the invention. 



Fomi PCT/lSA/210 (first sheet) (July 1998) 



INTERNATIONAL SEARCH REPORT 



A. CLASSIFICATION OF SUBJECT MA- 

IPC 7 C12Q1/68 



IntamaUonal Application No 

IB 99/01958 



Aooording to Intemattonal Patent Classtfteation (IPC) or to both national daaeiflcaflon and IPC 



& FIELDS SEARCHED 



Minimum documentation aearched (daeeification system fbOowed by daaelflcation symbols) 

IPC 7 C12Q 



Documentation searched other ttian m^knun documentation to the extent ttiat such documents are included In the flelda aearehed 



Electronic data base consulted during the intemattonal search (name of data base and. where practical, search terms uaed) 



C. DOCUMEKTS CONSIDERED TO BE RELEVANT 



Catagoiy 


citation off document, with Incication. where appropriate, of the fotevarrt pnTwagoo 


Relevant to dakn No. 


X 


wo 97 34015 A (PENN STATE RES FOUND) 


1-24 




18 September 1997 (1997-09-18) 






the whole document 




X 


UO 98 12352 A (GEN HOSPITAL CORP ;UNIV 


1-24 




LELAND STANFORD JUNIOR (US)) 




26 March 1998 (1998-03-26) 






the whole document 






-/- 





m 



Further documerrts are Osted In the continuation of t>ox C. 



10 



Patent family members are listed In annex. 



* Special categories of cited documents : 

'A* document defining the general state of the art which is not 
considered to be of particular relevar>oe 

'E* earlier document but published on or after the international 
fIDngdato 

*L* document which may throw doubts on priority claim(8) or 
wtUch Is cited to establish the publicauon date of another 
dtatlon or other special reason (as specified) 

*0* document rsferrlng to an oral diacloaure, uss. exhibition or 
other means 

'P' document put)fl8hed prior to the International ffilng date but 
later than the priority date dalmed 



T later document published after tfw international filing date 
or priority date and not In oonffict wtth the application but 
cited to undoTBtand tt>e principle or theory undertylngthe 
Invention 

*)(* document of particular relevance: the claimed Inverrtlon 
canrat be oonstdered novel or cannot t>e considered to 
Involve an inventive step when tf>e document la taken alone 

"Y* document of particular relevance; the dalmed invention 

canfKit be oortsldsred to Involve an inventhre step when ttte 
document la combined with one or more other such docu- 
ments, such combination beirn obvious to a person skilled 
Intheart. 

*&* document memt>er of the same patent famDy 



Date of \ho actual oompletk>n of the International search 

30 Nay 2000 


Date of nudRng of the International search report 

14/06/2000 


Nanr>e and mailing address of the ISA 

European Patent Office, PB. 5618 Patentiaan 2 
ML- 2280 HV Ri|sw1^ 
Tel. (+31-70) 340-2040. Tx. 31 651 epo nl. 
Fax: (431-70) 340-3016 


Autttortzed offk>er 

Molina Gal an, E 



Fofm PCT/1SA/210 (aeoond ehoet) (July 1992) 



page 1 of 3 



INTERNATIONAL SEARCH REPORT 



C^ContlnuoUon) DOCUMENTS CONSII 




BERELEVAffT 



Intamatlonal AppUeatlon No 



IB 99/01958 



Category * Citation of document with indtoation^where appropriate, of tt>e relevant passages 



Relevant to daim No. 



DATABASE BIOSIS 'Online! 

BIOSCIENCES INFORMATION SERVICE, 

PHILADELPHIA, PA, US1998 

DU YAN6-ZHU ET AL: "Multiple exon 

screening using restriction endonuclease 

fingerprinting (REF): Detection of six 

novel mutations In the LI cell adhesion 

molecule (LICAM) gene." 

Database accession no. PREV199800180156 

XP002139122 

abstract 

& HUMAN MUTATION 1998, 
vol. 11, no. 3, 1998, pages 222-230, 
ISSN: 1059-7794 

DATABASE BIOSIS 'Online! 

BIOSCIENCES INFORMATION SERVICE, 

PHILADELPHIA, PA, US1998 

GILAD SHLOMIT ET AL: "Identification of 

ATM nutations using extended RT-PCR and 

restriction endonuclease fingerprinting, 

and elucidation of the repertoire of A-T 

mutations in Israel." 

Database accession no. PREV199800093427 

XP002139123 

abstract 

& HUMAN MUTATION 1998, 
vol. 11, no. 1, 1998, pages 69-75, 
ISSN: 1059-7794 

WO 91 17262 A (UNIV BRITISH COLUMBIA) 
14 November 1991 (1991-11-14) 
the whole document 

EP 0 534 858 A (KEYGENE NV) 
31 March 1993 (1993-03-31) 
cited in the application 
the whole document 

VOS P ET AL: "AFLP: a new technique for 

DNA fingerprinting" 

NUCLEIC ACIDS RESEARCH, GB, OXFORD 

UNIVERSITY PRESS, SURREY, 

vol. 23, no. 21, 1995, pages 

4407-4414-4414, XP002109691 

ISSN: 0305-1048 
cited in the application 
the whole document 

-/-- 



1-24 



1-24 



1,18-24 



18-24 



18-24 



Form PCT/ISA/210 (oonttnuatlon d Moond sheet) (July 1992) 
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INTERNATIONAL SEARCH REPORT 



C^Cofillnuatlon) DOCUMENTS CONSII 



BERELEVAffT 



IntamaHonal Application No 



IB 99/01958 



Category * CItatkxi of document. wHh tndfeatton.whem appropriate, of the relevant passages 



Relevant to dakn h4o. 



WANG D G ET AL: "Large-scale 
Identification, mapping, and genotyping of 
single-nucleotlde polymorphisms in the 
human genome" 

SCIENCE, US, AMERICAN ASSOCIATION FOR THE 
ADVANCEMENT OF SCIENCE,, 
vol. 280, 1 January 1998 (1998-01-01), 
pages 1077-1082, XP002089398 

ISSN: 0036-8075 
cited In the application 
the whole document 

WO 97 29211 A (US HEALTH ;WEINSTEIN JOHN N 
(US); BOULAMWINI JOHN (US)) 
14 August 1997 (1997-08-14) 

WO 96 17082 A (DU PONT ;MORGANTE MICHELE 
(IT); V06EL JULIE MARIE (US)) 
6 June 1996 (1996-06-06) 

US 5 741 678 A (RONAI ZEEV A) 
21 April 1998 (1998-04-21) 



25-36 
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INTERNATIONAL SEARCH REPORT 

' • ^ Infonturtlon on potont family mombon 


tntonurtloruU Appflcatlon Mo 
PG£yTR QQ/niQRA 


Patent document Publication 
cited In search report 1 ^ date 


Patent famfly 
fnemb6r(s) 


Publication 
date 


WO 9734015 A 18-09-1997 


AU 2324997 A 


01-10-1997 



CA 
EP 



2248981 A 
0929694 A 



18-09-1997 
21-07-1999 



WO 9812352 


A 


26-03-1998 


US 


6004783 A 


21-12-1999 








EP 


0948645 A 


13-10-1999 


WO 9117262 


A 


14-11-1991 


AU 


7777091 A 


27-11-1991 








CA 


2062943 A 


31-10-1991 








GB 


2249627 A,B 


13-05-1992 



EP 0534858 



31-03-1993 



EP 


0969102 


A 


05-01- 


-2000 


AT 


191510 


T 


15-04- 


-2000 


AU 


672760 


B 


17-10- 


-1996 


AU 


2662992 


A 


27-04- 


-1993 


CZ 


9400669 


A 


15-12- 


-1994 


DE 


69230873 


D 


11-05- 


-2000 


UO 


9306239 


A 


01-04- 


-1993 


FX 


941360 


A 


24-05- 


-1994 


HU 


68504 


A 


28-06- 


-1995 


JP 


6510668 


T 


01-12- 


-1994 


NO 


941064 


A 


20-05- 


-1994 


US 


6045994 


A 


04-04- 


-2000 


ZA 


9207323 


A 


30-08- 


■1993 



WO 9729211 A 14-08-1997 AU 2264197 A 28-08-1997 



WO 9617082 


A 


06-06-1996 


AU 


704660 


B 


29-04-1999 








AU 


4367496 


A 


19-06-1996 








DE 


69507646 


D 


11-03-1999 








DE 


69507646 


T 


16-09-1999 








EP 


0804618 


A 


05-11-1997 








JP 


10509594 


T 


22-09-1998 








NZ 


298236 


A 


28-01-1999 








US 


5955276 


A 


21-09-1999 


US 5741678 


A 


21-04-1998 


US 


5512441 


A 


30-04-1996 








AU 


4157696 


A 


06-06-1996 








CA 


2205392 


A 


23-05-1996 








WO 


9615139 


A 


23-05-1996 
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INTERNATIONAL PRELIMINARY EXAMINATION REPORT 

(PCT Article 36 and Rule 70) 
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Applicant's or agent's file reference 
MET-001-PCT 


See Notification of Transmittal of International 
FOR FURTHER ACTION Preliminary Examination Report (Form PCT/IPEA/416) 


International application No. 
PCT/IB99/01958 


International filing date (day/month/year) 
09/11/1999 


Priority date (day/month/year) 
09/11/1998 


International Patent Classification (IPC) or national classification and IPC 
C12Q1/68 


Applicant 






METHEXIS N.V. et al. 







1. This international prelinninary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 

2, This REPORT consists of a total of 9 sheets, including this cover sheet. 

□ This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT), 

These annexes consist of a total of sheets. 



3. This report contains indications relating to the following items: 



1 




Basis of the report 


II 


□ 


Priority 


III 


IS 


Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 


IV 


□ 


Lack of unity of invention 


V 




Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations suporting such statement 


VI 


□ 


Certain documents cited 


VII 


□ 


Certain defects in the international application 


VIII 




Certain observations on the international application 



Date of submission of the demand 


Date of completion of this report 




05/06/2000 


19.01.2001 




Name and mailing address of the international 
preliminary examining authority: 

^ European Patent Office 
/flS) D-80298 Munich 

Tel. +49 89 2399 - 0 Tx: 523656 epmu d 
Fax: +49 89 2399 - 4465 


Authorized officer 
Favre, N 

Telephone No. +49 89 2399 7363 


\ 



Fom^ PCT/IPEA/409 (cover sheet) (January 1994) 



% % 

INTERNATIONAL PRELIMINARY 

EXAMINATION REPORT International application No. PCT/IB99/01 958 



I. Basis of the report 

1 . This report has been drawn on the basis of (substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments (Rules 70. 16 and 70, 1 7).): 

Description, pages: 

1-45 as originally filed 

Claims, No.: 

1 -36 as originally filed 

Drawings, sheets: 

1/8-8/8 as originally filed 

Sequence listing part of the description, pages: 

1-7, as originally filed 

2. With regard to the language, all the elements nnarked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which Is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out on the basis of the sequence listing: 

H contained In the international application in written form. 

H filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority In written form. 

□ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

□ The statement that the information recorded in computer readable form is identical to the written sequence 
listing has been furnished. 

4. The amendments have resulted in the cancellation of: 
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□ 
□ 



□ 



the description, 
the claims, 



pages: 



Nos.: 



the drawings, 



sheets: 



5. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this 
report,) 

6. Additional observations, if necessary: 

III. Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 

1 . The questions whether the claimed invention appears to be novel, to involve an inventive step (to be non- 
obvious), or to be industrially applicable have not been examined in respect of: 

□ the entire intemational application. 

H claims Nos. 1 1 and 18-36, with respect to novelty, inventive step and industrial applicability. 



□ the said international application, or the said claims Nos. relate to the following subject matter which does 
not require an international preliminary examination {specify): 

H the description, claims or drawings {indicate particular elements belov\/) or said claims Nos. 18-36 are so 
unclear that no meaningful opinion could be formed {specif^): 
see separate sheet 

la the claims, or said claims Nos. 1 1 are so inadequately supported by the description that no meaningful 
opinion could be formed. 

□ no international search report has been established for the said claims Nos. . 

2. A meaningful international preliminary examination report cannot be carried out due to the failure of the nucleotide 
and/or amino acid sequence listing to comply with the standard provided for in Annex C of the Administrative 
Instructions: 

□ the written form has not been furnished or does not comply with the standard. 

□ the computer readable form has not been furnished or does not comply with the standard. 

V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 



because: 



Form PCT/IPEA/409 (Boxes l-VIII, Sheet 2) (July 1998) 





INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/IB99/01 958 



1. Statement 



Novelty (N) 



Yes: 
No: 



Claims 3-5, 7-9. 13, 14, 16 and17 
Claims 1, 2, 6, 10, 12 and 15 



Inventive step (IS) 



Yes: 
No: 



Claims 

Claims 1-10,12-17 



Industrial applicability (lA) 



Yes: 
No: 



Claims 1-10, 12-17 
Claims 



2. Citations and explanations 
see separate sheet 

VIII. Certain observations on the International application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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Re Item III 

Non-establishment of opinion with regard to novelty, inventive step and 
industrial applicability 

1 . Claim 1 1 refers to a restriction endonuclease having a recognition sequence of 
two nucleotides. Such a restriction endonuclease is not known from the prior art 
and is not further described in the description. Thus claim 1 1 is not supported by 
the description and does not meet the requirements of Article 6 PCT. 

For these reasons, it was not possible to formulate an opinion with regard to 
novelty and inventive step in the sense of Articles 33(2) and 33(3) PCT for claim 
11. 

2. Claims 1 8-36 refer to methods for obtaining probes which are suitable for use in 
detecting endonuclease site polymorphisms and for producing microarrays 
comprising said probes, said microarrays being suitable for use in detecting 
endonuclease site polymorphisms. In the application as filed, there is no example 
for such a endonuclease site polymorphism (Rule 5.1(a)(v) PCT). Moreover, such 
an endonuclease site polymorphism is not a product produced by the 
identification methods of claims 1-17, which are methods of identification. It is 
rather a result thereof, i.e. a product that has been identified using the said 
methods. 

Therefore, claims 18-36, which encompass any endonuclease site polymorphism, 
are not considered to be supported by the description in the sense of Article 6 
PCT. 

Moreover, there is no indication whatsoever in the application as filed of any 
technical step which would allow the person skilled in the art to perform the 
methods falling within the scope of claims 18-36. Therefore, claims 18-36 are also 
considered not to be sufficiently disclosed in the application as filed for the skilled 
person to carry them out and thus do not meet the requirements of Article 5 PCT. 

For instance, independent claim 18 describes a method comprising the following 
technical features: (1) generation from a sample of target DNA fragments that can 
be simultaneously amplified and (2) selection of DNA fragments having an 
endonuclease site polymorphism. 
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The scope of independent clainn 18 in its present formulation is so broad that it 
encompasses any method of identification of polymorphic DNA sites as long as 
said polymorphic site is comprised within a restriction enzyme recognition site 
(see also Item VIII, point 1.). Therefore, independent claim 18 is prima facie not 
novel over the methods disclosed in D1-D6 (see Item V, points 1.-1.6), which refer 
to polymorphism and restriction, or even over D7 (Science, 1998, 280:1077- 
1082), which refer to single nucleotide polymorphism (SNP). 

According to the above objections, it is not possible to assess claims 18-36 for 
novelty and inventive step in the sense of Articles 33(2) and 33(3) PCT. 




Re Item V 

Reasoned statement under Article 35(2) with regard to novelty, inventive step or 
industrial applicability; citations and explanations supporting such statement 



1 . Independent claim 1 refers to a method for detecting DNA polymorphism in 
restriction enzyme recognition sites. This method comprises the following 
technical features: (1) generation from a sample of target DNA fragments that can 
be simultaneously amplified, (2) treatment of said fragments with at least one 
restriction enzyme, (3) amplification of the DNA fragments obtained after 
restriction and (4) determination of the presence of amplified fragments, wherein 
amplification is associated with the lack of recognition site for the used restriction 
enzyme(s). 

1.1 These technical features are disclosed in the figure 2 of document D1 (WO-A-97 
04010) which further refers to multiplex PGR (concomitant amplification) on page 
14 (lines 1-13). Document D1 is thus novelty^destroying for the subject-matter of 
independient claim 1 . 

1 .2 The figure 1 3 of document D2 (WO-A-98 1 2352), which refers to alleles AC and 
BD (a set of concomitantly amplifiable target DNA), also discloses all the technical 
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features of the method described in independent claim 1 . Therefore, D2 also 
destroys the novelty of claim 1 . 

1 .3 Moreover, the wording "set of concomitantly amplifiable target DMA fragments" as 
used in claim 1 does not require the amplification to take place within the same 
reaction tube, i.e. two different DMA amplification reactions performed 
simultaneously and linked within the scope of one large experiment could be 
considered as being concomitantly amplified, even when taking place 
simultaneously in two different thermocyclers (see also D3: WO-A-91 17262; page 
26, lines 10-13 and page 21, line 22 - page 22, line 6). Therefore, the method 
disclosed in the figure 2 of D4 (US-A-5741678) is also novelty-destroying for the 
subject-matter of claim 1 when said method is performed simultaneously for the 
assessment of some of the mutations from the list therein-disclosed (column 15, 
lines 33-42). 

Hence, independent claim 1 does not meet the requirements of Article 33(2) PCT. 

1 .4 Document D1 discloses a list of different restriction enzymes having a recognition 
site of four or more nucleotides (Table I, page 51). Moreover D1 refers to 
polymerase chain reaction (PGR) as a possible DNA amplification method for the 
therein-described methods (page 1 1 , line 25) and discloses several primer pairs 
spanning polymorphic sites which can be used for said PGR. Therefore, 
dependent claims 6, 10 and 15 which refer to these features are not novel in the 
sense of Article 33(2) PGT. 

1.5 Furthermore, D1 discloses an embodiment where DNA amplification and 
restriction occur simultaneously within a reaction (page 21 - page 22). Dependent 
claims 2 and 12 are thus not novel and do not meet the requirements of Article 
33(2) PGT. 

1.6 Dependent claims 3-5, 7-9, 13, 14, 16 and 17 do not appear to contain any fea- 
tures which, in combination with the features of any claim to which they refer, 
meet the requirements of the PGT in respect of inventive step. 

a) Extraction of genomic DNA generally leads to DNA fragments (ideally entire 
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chromosomes) which are often too large to be worked with. A standard 
method for reducing the size of said large fragments is digestion with a 
restriction endonuclease which have long recognition sites (rare cutter), as 
illustrated in D1 (see point 1.5 above). However, should the so-obtained 
fragments still be too large for a particular purpose, the person skilled in the 
art would use a second restriction enzyme to generate smaller fragments. In 
order to obtain significantly smaller fragments, which for instance would 
required for DMA amplification after ligation of adapters (see for instance D5: 
"WO-A-96 17082", e.g. figure 1), the skilled person would consider using an 
enzyme having a shorter recognition site (frequent cutter), i.e. having 
statistically more recognition sites within one large DNA molecule, or a 
combination of a rare cutter and a frequent cutter. This is illustrated in D6 
(Nucleic Acids Research, 1995, 23(21 ):4407-44 14) which discloses the use 
of a combination of the restriction enzymes EcoRI (rare cutter) and Msel 
(frequent cutter) for similar purposes (page 4408 column 1 , line 22 - page 
4409 column 1 , line 3). Dependent claims 3-5, 7-9 and 16 are thus not 
considered to be inventive in the sense of Article 33(3) PCT. 

b) Dependent claims 13 and 14 define whether the polymorphism creates or 
deletes a restriction site. Although the prior art documents only refer to 
polymorphism, it is obvious for the person skilled in the art that both variants 
are meant. Therefore, claims 13 and 14 cannot be considered to be 
inventive in the sense of Article 33(3) PCT. 

c) Identifying PGR products by Southern blot (hybridisation to cognate DNA 
probe) is state of the art. Thus, dependent claim 17 is not inventive in the 
sense of Article 33(3) PCT. 



Re item VIII 

C rtain obs rvations on the international application 

1 . The meaning of the term "restriction endonuclease" as used throughout the 
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description and the claims is unclear in the sense of Article 6 PCT. According to 
the description (page 10, line 30 - page 11, line 3), any endonuclease which 
recognise a sequence of nucleotide, i.e. at least two, is encompassed in this term. 
Given the extremely low specificity of such a restriction enzyme, any single point 
mutation could be considered to be encompassed within the scope of the claims. 

2. In claim 2, the term "wherein" seems to be lacking, thereby resulting in a lack of 
clarity of said claim (Article 6 PCT). 

3. Step (c) of claim 25 refers to "endonuclease treated target DNA fragments of step 
(b)". However, step (b) does not refer to an endonuclease treatment. This results 
in a lack of clarity of the claim in the sense of Article 6 PCT. 
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RESTRICTED AMPLICON ANALYSIS 



Field of the Invention 

The present invention generally provides a method which facilitates the 
detection of polymoiphisms (or mutations). The method is directed to the analysis of 
so-called endonuclease site polymorphisms (ESPs) that result in the gain or loss of a 
restriction endonuclease site. In essence, the ESP is probed with the restriction 
endonuclease reagent prior to amplification, whereby amplification is prevented and 
consequently no signal is observed when cleavage takes place. Unambiguous allele 
calling is performed by comparing the signals obtained with and without cleavage with 
the restriction endonuclease reagent. The method is particularly useful for multiplex 
genotyping, involving the parallel analysis of large numbers of single nucleotide 
polymorphisms. Preferred methods for detecting the amplicons involve hybridization 
to an arrayed or otherwise identifiable set of cognate probe fragments or 
oligonucleotides. 

Background of the Invention 

Molecular approaches for genetic analyses trace the nucleotide sequence 
variation that occurs naturally and randomly in the genomes of all Mving species. 
Knowledge of the DNA polymorphisms among individuals and between populations 
is important in understanding the complex links between genotypic and phenotypic 
variation. In the absence of complete data about sequence variation, one relies on the 
ability to identify * nearby' markers that allow to infer the location of certain relevant 
loci or causal sequence variations. The informativeness of the marker depends on the 
magnitude of the linkage disequilibrium. Markers can be used in Unkage studies to 
search for candidate genes and in association studies to identify the functional allelic 
variation on candidate genes that influence inter-individual variation. 

The vast majority of sequence variation consists of nucleotide 
substitutions, often referred to as single nucleotide polymorphism's (SNPs), resulting 
from mutations that have accumulated during evolution. Most of these nucleotide 
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changes are genetically silent; i.e., they have no measurable biological effect, but 
provide an immense reseivoir of variation in DNA structure. Most methods for genetic 
analysis used today rely on the detection of nucleotide sequence variation which can 
be measured by DNA fragment analysis using electrophoretic separation, in which 
5 DNA fragments are fractionated based on size or conformation. Occasionally the 

nucleotide sequence variation will affect either the presence of the DNA fragment or 
its mobility. In this way the primary nucleotide sequence variation will give rise to 
easily detectable DNA fragment polymorphism. Since polymorphic DNA fragments 
are derived from precise locations on the organism's genome, they can serve as reliable 

10 genetic markers, or landmarks to identify and locate genes. 

A host of assays to detect DNA polymorphisms, and SNPs in particular, 
have been developed. In some of these assays {e,g. , liFLP [Botstein, D. , White, R.L. , 
Skohiich, M., Davis, R.W., Am. J. Hum. Genet. 32:314-331 (1998)], CAPS 
[Konieczny, A. Ausubel, J.F., Plant J. 4:403-410 (1993)], dCAPS [Neff, M.M. Neff, 

15 J.D., Chory, J., Pepper, A.E., The Plant Journal 14:387-392 (1998)], PIRA 

[Steinbom, R., MuUer, M., Brem, G., Biochim. Biophys. Aaa 1397:295-304 (1998)]), 
restriction enzymes are used to detect polymorphic nucleotide sequences that affect 
cleavage. The specificity of restriction en2ymes is such that they exhibit a unique 
sensitivity to detect single nucleotide differences occurring in their recognition sites, 

20 The princq)al strengths of restriction enzyme-based genetic analyses are the ease of use 
and the robustness of the assays. In the majority of the cases, the restriction site 
polymorphism is used to detect known, previously identified SNPs and the assay 
consists of any electrophoretical fragment analysis. In one report, the allelic variation 
is detected in a soUd-phase ELJSA-type setting [Tniett, G.E., Walker, J. A., Wilson, 

25 J.B., Redmann, S.M. Jr., TuUey, R.T., Eckardt, G.R., Plastow, G., Mamm. Genome 

9:629-632 (1998)]. 

In WO 91/17269, Lemer et al. describe a different method for mapping 
a eukaryotic chromosome by restriction endonuclease mapyping of discrete DNA 
sequences which are complementary to a region of a eukaryotic chromosome. 

30 Vos et al, NucL Acids Res. 23:4407-4414 (1995) and EP 0 534 858 

describe a technique for DNA fingerprinting called AFLP which is based on the 
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selective polymerase chain reaction based application of restriction fragments of a 
digest of genomic DNA. The application reaction depends on the use of primers that 
extend into restriction fragments amplifying only those fragments in which prior 
extensions match the nucleotide sequence flanking the restriction sites. 

Another method utilizing DNA amplification stq>s is set out in Williams 
et aL, NucL Acids Res, 18:6531-6535 (1990), who describe a DNA fingerprinting 
method termed random amplified polymorphic DNA. 

DNA amplification fingerprinting was described by Caetano AnoUes in 
Bio/Technology 9:553-557 (1991). Still another fingerprinting technique called 
arbitrarily primed PCR was described in Welsh et aL , NucL Acids Res, 18:7213-7218 
(1990) and Welsh et aL , NucL Acids Res. 19:861-866 (1991). 

In WO 94/11530, Cantor et aL describe materials and methods for 
position and sequencing by hybridization. Cantor et aL also describe methods for 
creating assays of DNA probes useful in the practice of their method. 

The major shortcoming of the current methods of genetic analysis is the 
limited resolution of the DNA fragment analysis systems, namely the number of DNA 
fragments that can be sq^arated in a single assay. Generally the fractionation resolution 
ranges from tens to a couple of hundred DNA fragments, at the most. Consequently, 
current genetic analysis methods are limited to a few hundred to a thousand genetic 
markers. While this resolution has been sufficient for analyzing simple genetic traits 
determined by single genes, the analysis of complex traits, which is now being 
undertaken and which involve general or many different genes, will require the analysis 
of a much larger number of genetic markers. It is anticipated that such studies will 
require from a few thousand to possibly several hundred thousand genetic markers. 
Although this could conceivably be accomplished by performing many parallel assays, 
such scaling up wiQ be cost- and labor prohibitive. 

A technology that has great potential and which is generating widespread 
interest in the so-called micro-array technology (DNA chips). In general, these 
methods are based on measurement of the hybridization of DNA sequences in solution 
to probe sequences that are arrayed on a solid surface. When assaying nucleotide 
polymorphisms, the detector relies on the small differences in hybridization efficiency 
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between two different DNA sequences. In one format, fluorescently labeled sample 
DNA is hybridized to dense arrays of probe nucleic acids, sequence-specific 
hybridization signal is detected by scanning confocal microscopy, and DNA variants 
scored as (predictable) differences in the hybridization pattern. The micro-arrays are 
5 fabricated either by in-situ light-directed oligonucleotide synthesis [Fodor, S.P.A, et 

aL, Science 251: 767 (1991)] or by spotting DNA (off-chip synthesized 
oligonucleotides or PGR fragments) in an automated procedure. The technology has 
already been demonstrated in the scoring of mutations in mitochondrial DNA [Chee, 
M. et aL, Science 274: 610-614 (1996)], the HIV genome [Lipshutz, R.J. et aL, 

10 Biotechniques 19: 442-447 (1995)], the CFTR cystic fibrosis gene [Cronin, M.T. et 
aL, Human Mut.l: 244-255 (1996)], the BRCAl breast cancer gene (Hacia, G.H. et 
aL, Nat, Genet. 14: 441-447 (1996)] as well as the entire yeast genome [Winzeler, 
E.A. et aL, Science 281:1194 (1998)]. In comparison with most other assays, micro- 
arrays provide a platform for high-throughput, massively parallel i>olymoiphism 

15 detection. 

A major disadvantage with the use of microarrays relates to the 
complexity of the hybridization reaction. The detection relies on the very small 
difference in hybridization of DNA sequences differing by only one nucleotide. In 
general, a set of 4 oligonucleotides, differing only in the identity of the central base, 

20 is synthesized for each position in the target sequence that has to be interrogated. In 

practice, the number of oligonucleotides needed to correctiy genotype one SNP is much 
lai^ger, involving up to 56 different oligonucleotides spanning the variable base [Wang 
et al.. Science 280: 1077-1082 (1998)]. The degree of redundancy is also dramatic if 
one wants to screen the taiget DNA for all possible mutations; the design then includes 

25 overlapping oligonucleotide-sets that are offset by one base (a process known as tiling). 

It should be noted that the detection of SNPs by hybridization to arrays depends on the 
use of short oligonucleotide probes. With longer probes such as DNA fragments in the 
size range of 50 to 500 base pairs or larger, it is not possible to distinguish the SNP 
alleles. 
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Summary of thft TnvPntinn 
The present invention is directed to methods for genotyping 
polymorphisms that result in the gain or loss of an endonuclease cleavage site. Such 
5 polymorphisms are referred to hereinafter as endonuclease site polymoiphisms (ESPs). 

Polymorphisms detectable according to the methods of the present invention include 
single nucleotide polymorphisms (SNPs). The methods of the present invention exploit 
the high discriminatory power of restriction enzymes in a "Restricted Amplicon Assay" 
(RAA) which generally comprises the following steps (see Figure 1): 
10 (a) isolating sample DNA; 

(b) derivingia set of target DNA fragments, said set of target 
fragments comprising concomitantly amplifiable target DNA fragments from the 
sample DNA; 

(c) treating the target DNA fragments obtained in step (b) a probe 
15 restriction endonuclease reagent; 

(d) amplifying the amplifiable probe restriction endonuclease reagent 
treated target DNA fragments of step(c); and 

(e) analyzing the DNA of step (d) to determine which target 
fragments are amplified and/or which target fragments are not amplified; and wherein 

20 amplified target fragments lack a recognition site for the probe restriction endonuclease 
reagent and target fragments having a recognition site for a probe restriction 
endonuclease reagent are not amplified. 

In one aspect, the present invention is directed to RAA-methods, which 
comprise the preparation of concomitantiy amplifiable DNA segments by digestion of 

25 the starting DNA with one or more restriction endonucleases, collectively referred to 
herein as sampling enzymes. This method is herein referred to as format-I RAA and 
is diagrammed in Figure 2. The digested starting DNA may be further modified at its 
termini by the addition of adapters, which may serve to prime an amplification reaction 
(see Figure 2). Once sample DNA is obtained, it is treated with a different restriction 

30 enzyme, the probing enzyme also referred to as a probe restriction endonuclease 
reagent. A combination of probing and sampling enzymes are chosen such that a 
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substantial fraction of the sample fragments contain a single recognition site for the 
probe endonuclease reagent. In general, probe enzymes used with format-I RAA 
preferably have as a recognition site a nucleotide sequence of less than six nucleotides. 

In another aspect, the present invention is directed to methods for 
5 format-n RAA for the detection of ESPs, as diagrammed in Figure 3. Format-H RAA 

operates on the same principal as format-I RAA except that the sample amplicons need 
not be DNA fragments, but are rather defined regions of a genome amplifiable with 
specific primer pairs. The amplicons of the format-H RAA are identified on the basis 
of sequence data; e.g. the sequence of ESP-containing restriction fragments identified 

1 0 using format-I RAA method or otherwise known SNPs affecting endonuclease cleavage 

sites. In format-II RAA, the test DNA to be analyzed is treated with a probe restriction 
endonuclease reagent, followed by the concomitant amplification of regions of the 
treated DNA (amplicons) using predetermined primers using, for example, the 
polymerase chain reaction as described herein. The analysis of the amplification 

15 products then proceeds as described in the format-I RAA methods described herein. 

As with format-I RAA, an ESP is genotyped by the presence or absence of a 
recognition site for the probe restriction endonuclease reagent. 

In yet another aspect, the present invention is directed to methods for 
format-in RAA. In essence, format-m RAA consists of a combination of the format-I 

20 and format-II approaches. One of such combinations is diagrammed in Figure 4. Test 

DNA, digested or not with a probe endonuclease reagent, is sampled with a pair of 
endonuclease reagents and the resulting fragments are co- as described in the format-I 
assay amplified (this step is referred to as the pre-amplification step). These pre- 
amplification mixtures are, in turn, used as templates for a format-H type of PGR 

25 reaction in which multiple ESP-containing regions are selectively co-amplified using 
specific primer sets. The analysis of the amplification products then proceeds as 
described before. The advantages of format-m RAA are that the stepwise 
amplification facilitates the multiplex PGR of the ESP-specific amplicons and lowers 
the amount of starting material required to interrogate all the ESPs. 

30 Arrays, or microarrays of probe DNA wherein the probe DNAs are 

useful in the detection of ESPs are also encompassed by the present invention. 
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Informative probe DNAs are prepared and identified as described in detail below and 
arc then attached to a substrate for use in the hybridization reactions with concomitantly 
amplifiable DNA after treatment with a probe restriction endonuclease reagent and 
subsequent amplification. 
5 Since the method of the invention is based on the detection of a 

particular kind of DNA polymorphism, which occurs in DNA of any organism, the 
invention will be universally applicable. The methods of the present invention may be 
used to genotype ESPs in a wide variety of organisms from prokaryotic organisms, 
such as bacteria, through complex eukaryotic organisms, virases, or any organism 

10 having a genome however simple or complex. The methods may also be used for the 

analysis of extrachromosomal DNA, the DNA found in certain cellular organelles, 
cDNA preparations, or DNA libraries, such as yeast artificial chromosome libraries 
and others. Furthermore, based on the large body of DNA sequence data at hand, it 
is predicted that the genomes of higher organisms carry several hundreds of thousands 

15 of such DNA polymorphism. Consequently, the new method is capable of diagnosing 

the immense number of genetic markers that are needed to unravel complex traits. The 
method is of tremendous value for high throughput genetic analysis in the emerging 
field of pharmacogenomics. Similarly, the method has great potential in the field of 
animal and plant breeding, where high resolution genetic analysis will be needed to 

20 identify the genes involved in quantitative agronomic traits. 

Various aspects of the present invention are described in more detail 
below {see Detailed Descrq)tion of the Invention). Variations in each of these aspects 
will be readily appreciated by one of ordinary skill in the art and one with the scope 
of the invention. 

25 

Brief Description of the Drawings 
Figure 1 depicts the general concept of the Restricted Amplicon Assay. 
The vertical arrows indicate the positions of the ESPs. The open circles denote the 
probing enzyme sites that are present, while the closed circles denote the mutated sites. 
30 The first step involves cleavage of the test DNA with the probing endonuclease. The 

second step involves PGR amplification of DNA segments comprising the ESPs. The 
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small horizontal arrows denote the PCR primers flanking the ESPs. When cleavage 
occurs the DNA is cut between the PCR primers, preventing the subsequent 
amplification of the DNA segment comprising those ESPs. Only those DNA segments 
that were not cleaved are amplified. The final step comprises assaying the amplicons. 
5 Figure 2: Diagrammed representation of format-I RAA. The vertical 

arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the adapter ligation step. The open 

10 lines represent the adapters ligated to the ends of the sampled restriction fragments. 

Step 3 represents the probing enzyme cleavage step and the small horizontal arrows 
denote the PCR primers matching the adapter sequences. Step 4 represents the PCR 
amplification step in which only the sample fragments that are not cleaved by the 
probing enzyme are amplified. The crossed circles represent the fragments that are not 

15 amplified. 

Figure 3: Diagranmied representation of format-II RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
probing enzyme cleavage step. The dotted boxes denote the DNA sequences flanking 

20 the ESP sites. Step 2 represents the PCR primer design. The small horizontal arrows 
denote the PCR primers flanking the ESPs Step 3 represents the PCR amplification step 
in which only the sample fragments that are not cleaved by the probing enzyme are 
amplified. The crossed circles represent the fragments that are not amplified. 

Figure 4: Diagrammed representation of format-m RAA. The vertical 

25 arrows indicate the positions of the ESPs, with the open and closed circles denoting the 

probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage step. Ttie vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the pre-amplification step in which 
the sampled fragments are amplified. Step 3 represents the probing enzyme cleavage 

30 step. Step 4 represents the PCR primer design. The small horizontal arrows denote the 
PCR primers flanking the ESPs. Step 5 represents the PCR amplification step in which 
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only the sample fragments that are not cleaved by the probing enzyme are amplified. 
The crossed circles represent the fragments that are not amplified. 

Figure 5: Graphic representation of target fragments produced by 
cleavage with a hexacutter (fiill arrows) and a tetracutter (dotted arrows) restriction 
enzyme. Two types of fragments are produced: type I fragments (dotted lines) carrying 
two tetracutter ends and type II fragments (full lines) carrying one hexacutter end 
(represented by the arrowhead) and one tetracutter end. Upon PCR amplification only 
the type I fragments are amplified. 

Figure 6: EcoRI-Bfal fragments from ecotypes Columbia (C) and 
Landsberg (L) obtained after selective amplification using EcoRI and Bfal AFLP 
primers with respectively 2 and 3 selective nucleotides. The fragment patterns were 
obtained respectively without probing enzyme (no enzyme) and after digestion with the 
Msel probing enzyme. It is noted that most of the larger fragments do not survive after 
Msel digestion, while the majority of the smaller fragments survive the treatment. The 
differences between the ecotypes Columbia (C) and Landsberg (L) observed after Msel 
digestion, marked by the arrows represent ESP carrying fragments. The differences 
found without Msel digestion, marked by the stars represent typical AFLP 
polymorphisms. 

Figure 7: Hybridization patterns obtained on the Arabidopsis micro- 
arrays. The layout of the Arabidopsis micro-array is as follows: the left panel contains 
the ESP fragment probes derived from Columbia (upper half) and Landsberg (lower 
half), while the right panel contains the control monomorphic probes with respectively 
the negative control fragments (-control) always carrying a probing endonuclease site 
and the positive control fragments (-hcontrol) carrying no probing endonuclease site. 
The upper pan of the figure shows the hybridization patterns obtained with uncleaved 
sample DNA, while the lower part of the figure shows the hybridization patterns 
obtained with cleaved sample DNA. The dark-grey circles code is as follows: light- 
grey circles represent hybridization with the Cy3-labeled fragments, dark-grey circles 
represent hybridization with the Cy5-labeled fragments, black circles represents 
hybridization with both the Cy3-labeled and the Cy5-labeled fragments, and open 
circles represent no hybridization. In this figure of a set of idealized results is 

RECTIFIED SHEET (RULE 91) 
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presented. The hybridization patterns with the uncleaved sample DNA shows that all 
probes detect sequences in both ecotypes. while the hybridization patterns with the 
cleaved sample DNA show that the ESP fragment probes detect only the sequences in 
the respective ecotypes from which the ESP fragments were isolated. In addition, 
fragments carrying no site for the probing enzyme, detect sequences in both ecotypes. 
while fragments that always carry a site for the probing enzyme do not show a 
hybridization signal. 

Figure 8: Hybridization patterns obtained on the corn micro-arrays. The 
layout of the corn micro-array is as follows: the left panel of probes contains random 
fragments derived from B73, while the right panel contains Mol7-fragments. The 
figure shows four hybridization patterns obtained with respectively uncleaved sample 
DNA. Msel-cleaved, Tsp509I-cleaved and Alul-cleaved cleaved sample DNA. The 
uncleaved sample DNA hybridization pattern shows probes that hybridize only to B73 
(light-grey circles), respectively Mol7 (dark-grey circles) fragments, which represent 
polymorphisms resulting from mutations in the sample enzyme recognition sites. The 
cross in the circle indicates that these probes are eliminated from the analysis. The 
cleaved sample DNA hybridization patterns show that the majority of the probes do not 
give a hybridization signal, indicating that their cognate fragments are cleaved by the 
probing enzyme. Most of the probes giving a signal hybridize to both sample DNAs. 
Those that hybridize to only one of the sample DNAs and that were eliminated 
represent fragments carrying ESPs. The arrows denote the probes that were retained 
for further analysis. 

Detailed Descriptio n of the Invpnf^ ftn 

The term "SNP" means Single Nucleotide Polymorphism, i.e. a 
polymorphism involving the mutation of a single base-pair. 

The term "ESP" means Endonuclease Site Polymorphism, i.e. a 
polymorphism involving two alleles one of which is cleaved by an endonuclease 
reagent while the other exhibits (at least partial) resistance to cleavage by the same 
endonuclease under the same conditions. 

The phrase "(restriction) endonuclease reagent" refers to a reagent that 
consists of one or more enzymes and that cleaves nucleic acids with a certain 

RECTIFIED SHEET (RULE 91) 
ISA/EP 
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specificity, i.e. cleavage involves recognition of a particular sequence or set of 
sequences in the target DNA. Endonuclease reagents include but are not limited to the 
common type n restriction enzymes. 

The term "sampling endonuclease(s)" or "sampling enzyme(s)" refers to 
5 an endonuclease reagent used to derive sets of fragments from the sample DNA. 

The term "probing endonuclease(s)" or "probing enzyme(s)" refers to an endonuclease 
reagent used to probe the allelic state at specific ESP-sites. 

The terai "polymorphism" refers to the existence of two or more alleles 
at significant frequencies (^1%) in the population; polymorphism at a single 
10 chromosomal location constitutes a genetic marker. 

The term "micro-satellite (DNA)" refers to a small zirray (often less than 
0.1 kb) of tandem rq)eats of a very simple sequence, often 1 to 4 base-pair. Variability 
at such a locus is the basis of many genetic markers. 

The terai "mutation" means a heritable alteration in the DNA sequence. 
15 The term "allele" refers to one of several alternative sequence variants 

at a specific locus. 

The term "genotype" is conmionly known to mean (i) the genetic 
constitution of an individual, or (ii) the types of allele found at a locus in an individual. 

The term "haplotype" refers to the genotype at a series of linked loci on 
20 a single chromosome. 

The term "sample DNA" or "sample fragments" refers to the set of 
fragments or amplicons derived from the starting DNA by the RAA method. 

The term "zygosity" refers to the homozygous or heterozygous state. 

The term "homozygosity/homozygous" refers to the presence of identical 
25 alleles at a locus. 

The term "heterozygosity/heterozygous" refers to the presence of 
different alleles at a locus. 

The term "CpG" means a dinucleotide with a cytosine at the 5 '-side and 
a guanine at the 3 '-side. CpG is relatively rare in mammalian DNA because of the 
30 tendency for the cytosine to be methylated and subsequently mutate to thymine by 

deamination. 
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The term "ecotype" refers to a naturally occurring (plant) variety; race. 
The tenn "bi-alleUc" refers to a polymorphic locus characterized by two 
different alleles. 

The terms "microarray" and "(DNA-)chip" refer to a multitude of 
5 spatially addressable nucleic acids that serve as probes. The microarray may be used 

in the form of a planar solid support, a bead, a sphere, or a polyhedron. Fabrication 
is done either by in situ combinatorial synthesis of oligonucleotides using 
photolithography, or by robotic spotting of off-chip prepared DNA onto a solid 
surface. 

10 The methods of the present invention differs conceptually from 

previously described restriction enzyme-dependent assays (supra) that essentially detect 
a fragment length polymoiphism. With the present method, starting DNA is restricted 
prior to the amplification reaction and, rather than analyzing the obtained amplification 
product, the presence or absence of amplification is measured to determine the allelic 

15 state at an ESP site. The treated DNA is preferably amplified by using a polymerase 

chain reaction and is preferably analyzed by means of hybridization against arrays of 
probe DNAs. With the present method, a sample-amplicon, and consequently a 
hybridization signal, is either present or virtually absent. This feature represents a 
major advantage in that it results in a more accurate distinction between variable 

20 nucleotides than is possible by differential hybridization to allele-specific 

oligonucleotides, and because it greatly facilitates the identification of a set of generally 
useful hybridization conditions. Also, the methods of the invention permit the use of 
both oligonucleotides as well as DNA fragments as probe DNAs. While hybridization 
to arrays allows the simultaneous analysis of a large number of ESPs, it should be clear 

25 that the amplification of sample DNA, treated with probe restriction endonuclease 

reagent, can be analyzed by any of a variety of methods well known in the art. In these 
methods, an ESP is identified either by the presence of a recognition site for the probe 
restriction endonuclease reagent (which will result in the failure of the sample DNA to 
amplify) or by the loss of a recognition site which will allow amplification of an 

30 otherwise unamplifiable sample DNA. Alternative methods include, but are not limited 

to, gel-electrophoretic analysis, and the TaqMan assay [Holland P. M. et al, Proc. 
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Natl Acad. ScL 88: 7276-7280 (1991); with the latter assay detection is done during 

rather than after the amplification reaction]. 

One of the advantages of the method of the invention is the ability to 

calibrate the measured signal against that obtained in a control experiment where 

5 digestion with the probe restriction endonuclease reagent is omitted. Comparison of the 

respective hybridization signals, following various corrections and normalization 

procedures, is essential for the genotyping of ESPs and the accurate determination of 

the zygosity. The cleaved and uncleaved material can, in principle, be hybridized 

sq>arately but a preferred method consists of hybridizing a nuxture of the differentially 

10 labeled samples to the same array. The present invention is exemplified by several 

specific formats described below. 

(D Format-I RAA: Choice of sampling and probing restriction 
endonuclease reagents. In one of its embodiments the present invention is directed 

to methods for detecting ESPs in a "restricted amplicon assay" (RAA) which comprises 

15 preparing concomitantly amplifiable restriction fragments from the starting DNA 

(sample DNA). When generating discrete sets of DNA fragments from genomic DNA, 
the following parameters are important: the average fragment size and the total number 
of fragments. The optimal fragment size for use in the methods (and materials) of the 
present invention is a trade off; the fragments must be sufficiently small for 

20 amplification with roughly equal efficiency (in general <500 base pairs) and large 
enough for having on average one cleavage site for the probing endonuclease reagent. 
In addition to average fragment size, the number of fragments determine the 
complexity of the sample DNA which is critical in view of the limitations of the 
detection sensitivity of micro-array hybridization. In general, the current state of the 

25 art of microarray hybridization is such that the number of sample fragments should not 

exceed 100,000. All of the above-mentioned requisites can be met by the appropriate 
choice of sampling and probing enzymes. A preferred method of the present invention 
to prepare sample DNAs (amplicons) involves the use of two different sampling 
enzymes, a rare cutter endonuclease (e.^. , hexacutter) combined with a frequent cutter 

30 endonuclease {e.g., tetracutter), as described in EP 0 534 858 Al which describes a 
method called AFUP and which is incorporated herein by reference. As can be seen 
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from Figure 5, the rare cutter enzyme produces large fragments that upon cleavage 
with the frequent cutter enzyme are cut into a number of smaller fragments. This dual 
cleavage generates two types of fragments: the majority having both ends produced by 
the frequent cutter (type I) and a minority of fragments having a rare cutter end and a 
5 frequent cutter end (type 11), After ligating different adapters to each of the ends and 

using appropriate primers targeted to the ends of the fragments, only the type n 
fragments will be amplified efficiently (see Figure 5). The type I fragments amplify 
with greatly reduced efficiency presumably because the synthetic sequences at the two 
ends constitute an inverted repeat. In general the type n fragments will amplify 

10 synchronously using a single PCR primer pair that attaches to the ends of the 

fragments. The size limit is typicaUy around 500 base pairs, but can be increased by 
using a different DNA polymerase and other reaction conditions. Thus, as outlined 
above the number of amplifiable fragments will be determined primarily by the choice 
of the rare cutter restriction enzyme. By ai>proximation, this number equals two times 

15 the number of cleavage sites for the rare cutter. In a preferred embodiment, restriction 

enzymes recognizmg 6 nucleotides (hexacutters) or more are used as rare cutters. The 
use of a frequent cutter recognizing 4 nucleotides (tetracutter) as second sampling 
enzyme results in the production of fragments in the optimal size range for co- 
amplification. As probe restriction endonuclease reagents, different tetracutter or 

20 pentacutter enzymes can be used. The probe restriction endonuclease reagent and the 
frequent cutter sampling enzyme should preferably be chosen such that the ratio of the 
cleavage frequencies of probing over sampling reagent is >0.5 and <3. This will 
ensure that a substantial fraction of the target fragments are cleaved once by the 
probing enzyme. It is noted that ESPs cannot be genotyped when the fragments are 

25 cleaved more than once by the probing enzyme. Also, it should be recognized that 

cleavage with the probe restriction endonuclease reagent results in a significant 
reduction (typically 2-4 fold) of the fragment complexity. 

Alternative schemes - different from the one described above - that meet 
the requisites of sample complexity, average fragment size, and occurrence frequency 

30 of the probe reagent and that will perform equally well, will be readily apparent to one 

of ordinary skiU in the art. Alternative schemes may include the use of pairs of 
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frequent cutters, followed by selective amplification (described in EP 0 534 858 Al), 
or the use of type ns restriction enzymes. Type US restriction enzymes are 
characterized by an asymmetric recognition sequence. Most of these enzymes cleave 
at a defined distance to one side of the recognition site and generate single stranded 
overhangs that have different sequences. Ligation of adaptor sequences that are 
complementary to only one type of overhang allows the amplification of specific 
subsets of fragments [Kikuya Kato, Nucleic Acids Res. 23: 3685-3690 (1995)]. With 
this strategy the set of fragments obtained with the sampling enzymes can be broken 
up in a defined number of complementary and roughly equally complex subsets. Thus, 
with these enzymes it is possible to tune the complexity of the sample. The same 
strategy can be applied by making use of type n enzymes that have an interrupted 
palindromic recognition sequence. 

Type of mutations detected by format-I RAA: In essence the method 
of the invention aims to detect mutations affecting the recognition sequences of the site- 
specific probe endonuclease reagents. When the probe enzyme cleaves a sample 
fragment, it is prevented from being amplified and as a consequence the fragment will 
not give a hybridization signal with its cognate probe. Mutations affecting the 
recognition sequence of the probe enzyme will allow amplification of the sample 
fragment and will restore the hybridization signal. It is recognized that mutations other 
than those affecting the probe enzyme recognition sites may affect the hybridization 
signals. In particular, mutations afiecting the recognition sites of the sampling 
enzymes may also lead to a loss of hybridization signal. Consequently, the mere 
detection of a hybridization difference between two samples does not qualify the 
difference as being due to an ESP for the probing enzyme. For this one must also 
assay the two samples without probing enzyme cleavage; only those differences that are 
correlated with the cleavage by the probing enzyme qualify as genuine ESPs as defined 
according to the present invention. Therefore, a preferred embodiment of the methods 
of the present invention comprise the comparison of the hybridization signals obtained 
with and without cleavage of the same starting material by the probe endonuclease 
reagent. Preferably, the digested and undigested sample DNAs are differentiaUy 
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labeled such that equivalent amounts of the material can be mixed and hybridized 
against the same airay of probes. It is noted that a further advantage of measuring the 
relative hybridization signals obtained with digested and undigested sample DNAs, is 
that the signal given by the undigested sample DNA serves as an internal control for 
5 correcting variations in amplification and hybridization. 

Identification and design of informative probes to detect ESP- 
harboring fragments. In a preferred embodiment of the present invention sample 
DNAs (amplicons) are hybridized to micro-arrays comprising a set of probe DNAs 

10 which are designed such that each probe will hybridize sj>ecifically to one sample DNA 

fragment. For each set of sample DNA fragments a specific set of probes are 
developed that will detect all the ESPs present in the set of sample DNAs. Since in 
most applications only a (minor) fraction of the sample DNAs will actually carry an 
ESP for a particular probing reagent, the set of probe DNAs will preferably consist of 

15 a subset of the sample DNA fragments that are informative in that they hybridize to 

ESP-haiboring sample fragments. Preferably, the probes are highly specific for the 
ESP-carrying sample fragments, and do not cross-hybridize with other fragments in the 
sample. This feature is verified by testing the candidate probes in control hybridization 
assays. When developing or designing the probes care should be taken to avoid 

20 hybridization of the labeled primer used to amplify the sample fragments. When the 

probes correspond to a subset of the sample fragments, preferably an alternative set of 
adaptors should be used for their amplification. 

The sections below describe different approaches that may be used to 
assemble sets of unique probe DNAs for fabricating the micro-arrays. Three 

25 alternative approaches are presented, and their choice is determined primarily by the 

degree of nucleotide sequence variation, and hence the ESP frequency, present in the 
species under study. 

(1) Direct screening. When the ESP frequency is high, such that 10% or more of the 
sample fragments carry ESPs, a realistic approach for assembling ESP probes is 
30 to array individual sample fragments and test which of them detect an ESP in the 

test material under study. The advantage of this approach is that the same set of 
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fragments can be tested with different probe enzymes. After the screening one 
will retain only those probes that yield a clear-cut difference in hybridization 
between the different test DNAs. This approach is illustrated in Example 2. 

(2) Gel-based screening. With genomic DNA exhibiting intermediate ESP frequencies 
5 (a few %), useful probes can be identified with a gel-based screening approach 

in which the ESPs are identified by comparing the patterns of sample fragments 
obtained from cleaved and uncleaved genomic DNA of various individuals. The 
polymorphic fragments can then be isolated from the gel and cloned or amplified. 
In a second phase, these probe-fragments are verified in a micro-array 
10 hybridization assay. This approach is illustrated in Example 1. 

(3) Batch-wise hybridization selection method. Since both approaches described above 

are inefficient and labor intensive when the ESP frequency is low ( < 1 %), it is 
advantageous to directly select or enrich ESP-carrying fragments. Such an 
approach is described in greater detail in Example 3. 
1 5 The methods of the invention can be used with any type of micro-array: 

spotted ESP-carrying fragments, spotted oligonucleotides or oligonucleotides 
synthesized on solid supports using photolithography [Fodor S. P. A. et ah. Science 
251: 161-113 (1991)]. Oligonucleotide probes can easily be designed based on the 
nucleotide sequences of the ESP-carrying fragments. Also, the methods of the 
20 invention are not limited to the use of planar arrays containing spatially addressable 
probes. A person of skill in the artwill recognize that the methods may alos employ 
a multitude of identifiable solid phase particles (e.g. beads, spheres, and polyhedron), 
each carrying a different probe. Examples of such use are described by Fulton, R. 
[U.S. Patent No. 5,736,330] and Mandecki, W. [ U.S. Patent No. 5,736,332]. 

25 

(TO Format-n RAA Ge neral outline 

The *format-I RAA' - as described above - can be converted to a 
'format-n assay* when sufficient sequence information of ESP-containing sample 
fragments becomes known. Format-n RAAs can also be designed on the basis of the 
30 known sequences of genomic regions that harbor an ESP and that are available through 

pubUcly accessible databases. The approach involves the targeted sampling of starting 
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material and consists of the design of dedicated primer pairs that flank the ESP sites. 
Like in format-I RAA, if the site is intact, the starting DNA will be cleaved and no 
PGR product will be generated. Only when the site is mutated will the amplicon be 
generated. In practice, multiple ESP-containing genomic regions are co-amplified after 
5 cleavage with the probing restriction endonuclease reagent. The ultimate sample DNA 

used in the hybridization reaction is composed of several such multiplex PGR reactions 
pooled together. The feasibility of this approach is evidenced by the recent paper of 
Wang et aL, Science 280: 1077-1082 (1998), incorporated herein by reference. The 
methods for format-II RAA described here are identical to the approach described by 

10 Wang et aL, in the way certain allelic regions are co-amplified, but fundamentally 

different in the way they are diagnosed. The present method takes advantage of the 
clear distinction between having or not having an amplicon depending upon the allelic 
state of the endonuclease target site. The Wang et aL approach in contrast relies on the 
detection of a hybridization difference as a result of a single nucleotide variation in the 

1 5 PGR product. This requires a much more elaborate and redundant hybridization assay. 

Similar to fonnat-I RAA, a preferred method consists of comparing the 
hybridization signals obtained with and without cleavage with the probe restriction 
endonuclease reagent. Preferably, the respective amplification reactions are 
differentially labeled such that the resulting amplicons can be mixed and hybridized 

20 against the same array of probes. 

Preferred methods of the format-n RAA are those wherein - of each 
PGR primer pair - that primer that remained unlabeled is used as hybridization probe 
for the corresponding amplicon. This ensures that the excess unincorporated labeled 
primer as well as the primer extension products obtained with this primer cannot anneal 

25 to the arrayed probe. Also, the imlabeled PGR primer is complementary to the labeled 
strand of the amplicon. 

Furthermore, the format-II RAA method provides a means to monitor 
mutations in specific genes or loci in addition to scanning the entire genome. Indeed, 
sets of PGR primers that target ESPs in a specific gene or chromosome region can be 

30 assembled. 
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An RAA assay with positive detection o f both alleles: It is 
recognized that the 'present/absent-score' of the RAA assay cannot (always) distinguish 
between different mutations that can affect cleavage by the probe restriction 
endonuclease reagent. In practice, an ESP should not be assayed when available 
5 evidence indicates the existence of two or more such mutations at significant 

frequencies in the population. 

In a preferred embodiment the present invention is directed to the 
detection of SNPs that result in the simultaneous loss and gain of a restriction enzyme 
recognition site, i.e. both alleles are associated with a different recognition site. Hgal 
10 (GACGC) and SfaNI (GATGC) are an example of such reciprocal sites. Use of both 

probing endonuclease reagents in side-by-side experiments excludes alternative alleles 
and results in easy determination of the zygosity (refer to Example 4). 

Mutti-aUelic haplotyping;. A single ESP iiq)resents a bi-allelic marker, 
15 which is less informative than a variable micro-satellite, which has multiple alleles. 

It is possible however to compensate for the lower information content by identifying 
several ESPs on a specific chromosomal region. Format-n RAA lends itself readily 
to such an approach and involves the design of a primer pair that encompasses a region 
with a single site for the various selected probe endonuclease reagents. It should be 
20 recognized as one of the advantages of the present method that multiple ESPs on a 

sample amplicon can be interrogated with a single probe. Furthermore, use of the 
probing enzymes, either separately or in various combinations, in parallel experiments 
allows the construction of the haplotypes for the ESPs under study. In general, the 
statistical associations between traits and specific chromosome regions may be more 
25 apparent when haplotypes rather than individual markers are used. 



(HP Format-m RAA; 

In a general sense, the format-m RAA represents a method of choice for 
very high-density SNP genotyping because it provides a means to overcome the 
intrinsic limitations of both the fonnat-I RAA and the format-II RAA. This is 
essentially achieved by performing a stepwise amplification involving a pre- 
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ampliiication of sample fragments followed by amplification using multiplexed specific 
primers. The principal advantage of the pre-amplification step is to reduce the 
complexity of the starting DNA, and thus to provide a more favorable starting point 
for performing multiplex PCR reactions. It is noted that this improvement is generally 
5 applicable to any multiplex PCR reaction, and is not limited to the methods of the 

present invention. Such an approach can also be used when for example SNPs are 
genotyped using the methods described by Wang et aL 

The principal limitation of the format-I RAA lies in the complexity of 
the sample DNA that is hybridized to the microarray. Because the second round of 

10 amplification in format-in yields only very small amplicons, which are all informative, 

there is no longer a limitation in number of sample fragments that are interrogated. In 
fact the entire genome may be sampled in a series of parallel pre-amplification 
reactions and the amplicons generated in the different multiplex PCR reaction can then 
be pooled together and hybridized to the microarray. 

1 5 likewise, the format-m RAA represents preferred methods of format-H 

RAA, especially when the ESPs under study are located on fragments generated by one 
set of sampling endonuclease reagents. Such stepwise amplification comprises the co- 
amplification of sample fragments with a single pair of primers, followed by the 
selective amplification of sets of specific ESP-containing regions (see Figure 5). The 

20 principal advantage of the format-m RAA over format-H RAA is that the initial 
amplification of the sampling fragments - representing only a fraction of the total 
genome - lowers the amount of starting material required to interrogate a very large 
numbers of ESPs. Also, the approach wiU facilitate the multiplex amplification of the 
ESP-specific ampUcons and, consequently, yield a more robust assay. 

25 One preferred embodiment of the format-m RAA is its use to genotype 

large numbers of ESPs identified through the use of the format-I RAA. Indeed, 
format-I RAA offers a rapid means to discover large numbers of ESPs in any biological 
species where no large body of sequence information is or will be available. Format-I 
RAA enables one to discover many sets of ESPs for a number of different probing 

30 enzymes. Using the format-I RAA, each set of ESPs must be assayed on a different 
microarray, because otheiwise signals for the same sample fragment will overlap with 
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one another, and thus preclude the proper ESP genotype to be determined. Using the 
format-in RAA, the ESPs identified with different probing enzymes are now assayed 
together on one single microarray, without overlap between the different ESPs. The 
reason is that the overlap in the format-I RAA is caused by the non-iiiformative sample 
5 fragments that are always co-amplified with the ESP fragments. These are eliminated 

finom the mixture by the specific PGR amplification. This embodiment is illustrated in 
Examples 2 and 3. 

Another preferred embodiment of the format-in RAA is its use to 
genotype lai^ge numbers of SNPs identified in high-throughput sequencing of genomic 

10 DNA from different individuals from a given species. Given the generally recognized 

importance of SNPs for the development of high-resolution genotyping methods, 
sequenced SNPs can be expected to accumulate in large numbers in publicly available 
databases in the near future. In particular, in the field of human genetic analysis, SNPs 
will be discovered at a rapidly increasing rate through the massive genome sequencing 

15 programs now in progress. A similar evolution may be anticipated for many other 

species. Hence we decided to perform an in silica analysis of known human SNPs to 
further investigate the potential of the invention. More particularly we have analyzed 
the 3,358 SNP sequences present in the SNP database of the Whitehead Institute [Wang 
et aL, Science 280: 1077-1082 (1998)]. We have determined how many of these SNPs 

20 represent an ESP for each of 34 known palindromic and non-palindromic tetra- and 

penta-nucleotide restriction recognition sequences. When extrapolating this number to 
the total number of ESPs in the human genome - assuming a grand total of 3 million 
ESPs - it appears that the number of detectable ESPs per probing restriction enzyme 
is in the range of 25.000 to 150.000. A cumulative analysis reveals that 53% of the 

25 SNPs affect at least one of the 34 restriction sites; a total of 28 % affect the recognition 

site for one of the available tetracutter enzymes. The principal conclusion from this 
analysis is that many of the considered enzymes - used as probing enzymes according 
to the methods of the present invention - wLU interrogate sufficient SNPs to be able to 
built a high-density map of the human genome. It should also be noted that the use of 

30 multiple probing enzymes is easily acconunodated in the targeted assay because the 

sample has to be subdivided anyway over a number of parallel multiplex PCR 
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reactions. This embodiment is illustrated in Example 4, 

It is noted that the fomiat-in RAA may be perfomied according to 
different procedures. One such procedure is diagrammed in Figure 5, in which the 
test DNA is first sampled using a sampling endonuclease reagent, pre-amplified and 
5 then treated with the probing endonuclease reagent. Variations on this procedure are 

readily recognized by those skilled in the art and include for example, concomitant 
treatment of the test DNA with both the sampling and the probing endonuclease 
reagents and the preparation of sampled DNA fragments using arbitrary PGR priming 
methods [Williams et al. Nucleic Acids Res, 18: 6531-6535 (1990)]. Note that in case 
10 the treatment with the probing endonuclease reagent is performed prior to the pre- 

amplification, the subsequent amplification can be performed with any pair of PGR 
primers directed against the ESP carrying fragments, and thus overcoming the 
limitation of using PGR primers flanking the ESPs. 



15 



wo 00/28081 



PCT/IB99/01958 



-23 - 

Table I. Analysis of 3,358 SNPs in the Whitehead SNP database. The table lists the 
number of SNPs that represent an ESP for various probing enzymes. The last column 
shows the estimated number of ESPs for each enzyme in the entire human genome 
(refer to text for details). 
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The following iUustrative examples were chosen to represent the 
spectrum of genomic complexities and the spectrum of degrees of genetic variation 
5 which are susceptible to analysis using the methods of the present invention: 

Example 1 describes analysis of Arabidopsis (low genomic complexity, 
low genetic variation). 

Example 2 describes genetic analysis of com (high genomic complexity, 
high genetic variation). 

10 Examples 3 and 4 describe genetic analysis in humans (high genomic 

complexity, low genetic variation). 

Numbers given in the examples, and that relate to the occurrence 
frequency of certain restriction sites as well as the average size of the generated 
fragments are in part based on computer simulations using publicly available DNA 

15 sequences. 
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Example 1 
Genetic Analysis in Arabidopsis 
In this example, a fragment analysis-based approach is used to generate 
a set of genomic fragments carrying ESPs between the Arabidopsis ecotypes Landsberg 
5 and Columbia, which are commonly used for genetic studies in the model organism. 

Arabidopsis is an example of a low complexity genome (size "120 Mb), and the two 
ecotypes exhibit a moderate level of genetic variability. Previous studies have revealed 
that the average nucleotide sequence variation between the two ecotypes is in the order 
1 ix)lymoiphism in 150 nucleotides. Consequently, the fraction of fragments expected 
10 to carry an ESP for tetranucleotide recognizing restriction enzymes is expected to be 

in the range of 2.5 % (1:40). With such a low frequency, it is helpful to use a selection 
procedure to isolate the rare fragments containing ESPs. 

In essence the procedure described in this example comprises the 
following steps: 

15 4) Identification of a set of about 200 genomic fragments carrying 

Landsbeig/Columbia ESPs using a gel-electrophoretic approach. 

5) Isolation and characterization of the ESP carrying DNA 
fragments (ESP fragments). 

6) Generation of micro-arrays with the ESP fragments 
20 7) Confirmation of the ESPs by hybridization. 



Step 1. Identification of ESP fragmftnts 
Sampling en^mes. In the present example EcoRI, a restriction enzyme 
recognizing 6 nucleotides (hexacutter), in combination with Bfal, a restriction enzyme 

25 recognizing 4 nucleotides (tetracutter), are chosen as sampling enzymes. From the 
random frequency of occurrence of 6 nucleotide sequences (every 4,(X)0 bases), the 
number of sites for hexacutter restriction enzymes in this genome is predicted to be in 
the range of 30,000. In addition to cleavage with a hexacutter, the genomic DNA is 
also cut with a tetracutter so as to generate PCR amplifiable fragments of an average 

30 size of a few hundred base pairs. Cleavage with the two enzymes gives rise to two 

types of fragments: a majority of fragments resulting from cleavage by the tetracutter 
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enzyme alone and a smaller set of fragments produced by the two enzymes (see Figure 
5). Since the majority of the hexacutter fragments will give rise to two fragments 
having a hexacutter end and a tetracutter end {see Figure 5), this procedure will yield 
a mixture of about 60,000 fragments of this type. Upon amplification using the 
5 procedure described below only the fragments carrying a tetracutter end and a 

hexacutter end are amplified efficiently (Figure 5). 

Probing enzymes. As probing enzymes many different tetracutter 
enzymes can be used. IdeaUy, the probing enzyme cleaves most of the sample 
fragments once. Because plant DNA has a high AT content, the preferred tetracutters 

10 are those that have an AT bias in their recognition sequence. In general, the choice of 

an optimal tetracutter may be determined by particular features of the genome being 
analyzed (e.g. , AT and GC content). In the present example, Msel (recognition site = 
TTAA) was chosen. Tsp509I (recognition site = AATT) is an alternative. It is also 
conceivable to use mixtures of two or more tetracutter enzymes. The EcoRI-Bfal 

1 5 sample/target fragments that are cleaved and not cleaved with the Msel probing enzyme 

are referred to as cleaved and uncleaved sample/target DNA, respectively. 

Screening for ESP carrying fragments. To detect ESP fragments, subsets 
of uncleaved and cleaved EcoRI-Bfal sample fragments from both ecotypes are 
amplified and the amplicons are compared following gel-electrophoretic fractionation. 

20 Subsets of the EcoRI-Bfal sample fragments are selectively amplified as described 

[Vos, P. et al. Nucleic Acids Res, 23: 4407-4414 (1995); Zabeau, M, and Vos, P., 
European Patent Application EP 0534858 (1993) both of which are incoiporated herein 
by reference]. Given the complexity of the sample ("50,0(X) fragments), the selective 
amplifications are performed with EcoSI and Bfal primers having two and three 

25 selective nucleotides, respectively. This equals 1024 (16 x 64) different selective 

amplification reactions. 

The experimental procedure described by Vos P. et al. is followed 
except that the template fragments are incubated at 65 during 10 minutes to heat- 
inactivate the T4 ligase enzyme, and, when applicable, digested with the probing 

30 enzyme prior to amplification. The structures of the EcoRI and Bfal adaptors are as 

follows [jee, e.g., Vos, P. era/,, supra]: 
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5 • - CTCGTAGACTGCGTACC (SEQ ID NO: 1) 

CATCTGACGCATGGTTAA-5 ' (SEQ ID NO: 2) 

5 5' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

TACTCAGGACTCAT-5' (SEQ ID NO: 4) 

The EcoRI (radiolabeled by 5 '-phosphorylation) and Bfal primers, 
having two and three selective nucleotides, respectively, have the following sequences 
10 (where N represents A, C, G, or T): 

5 » -GACTGCGTACCAATTCNN (SEQ ID NO: 5) 
5 ' -GATGAGTCCTGAGTAGNNN (SEQ ID NO : 6) 

15 

Using these reagents, most of the obtainable target fragments contain a 
cleavage site for the probing enzyme and, consequently, will not be amplified when the 
target DNA is cleaved. Most of the fragments that survive the treatment with the 
probing enzyme occur in both ecotypes, and thus carry no ESP. Occasionally fragments 

20 are found that appear in both ecotypes when the target DNA is not digested and that 
arc present in only one of the two ecotypes after digestion. These represent true ESPs 
for the probing enzyme. In addition, fragments will also be found that show typical 
AFLP-polymoiphism between the two ecotypes [Vos, P. et al , Nucleic Acids Res. 23: 
4407-4414 (1995)]. Such polymoiphisms are apparent in the fragment patterns 

25 obtainable with the undigested sample DNAs. A typical result is shown in Figure 6 in 

which the electrophoretic patterns are shown of selectively amplified EcoRI-Bfal 
fragments from the Ecotypes Columbia and Landsberg obtained without and with 
digestion with the Msel probing enzyme. 

Systematic comparison of the patterns of ecotypes Columbia and 

30 Landsberg before and after digestion, allows the identification of EcoRI-Bfal sample 
amplicons that carry an ESP for the probing enzyme. Using Msel as probing enzyme, 
it is estimated that a total of "200 polymorphic fragments which are present in only one 
of the ecotypes can be identified. 
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Step 2. Isolation and characterization of ESP fragments. 

Each of the ESP polymoiphic fragments is eluted from the gel-matrix, 
re-amplified and cloned into a suitable plasmid vector (e.g. TA cloning system; 
Invitrogen, Carlsbad, CA, U.S.A.). In each case, two clones are selected for sequence 
determination. Most duplicate clones will yield the same sequence. Duplicate clones 
that gave different sequences were not retained for further work. Since the nucleotide 
sequence of over one third of the Arabidopsis genome is available in the public 
databases (e.g. , Genbank), the chromosomal location of one third of the ESP fragments 
can be determined by matching the fragment sequences to the genomic sequence. 
Furthermore since the genomic sequence is derived from ecotype Columbia, we expect 
a perfect match with the fragment sequences isolated from the same ecotype. The 
sequences of the fragments isolated from ecotype Landsberg will reveal single 
nucleotide differences, amongst which the potential restriction site mutations, affecting 
the Msel recognition sites, should be apparent. 

In addition to the ESP polymoiphic fragments, a number of non- 
polymorphic control fragments are processed in the same way. Two types of such 
control monomoiphic fragments are isolated: fragments that do not carry a site for the 
probing enzyme and fragments that carry a site for the probing enzyme in both 
ecotypes. These fragments will serve the purpose of verifying the hybridization on the 
micro-arrays. 

Step 3. Fabrication of ESP mic ro-arrays. 
Micro-arrays of amplified fi-agmems. The insert DNAs from the 
sequence verified clones are amplified, e.g. with the use of non-selective EcoRI and 
Bfal primers. PCR products are verified by agarose gel electrophoresis and retained if 
a single product of the correct mobility was present. Following ethanol precipitation, 
the resuspended PCR products are arrayed at high density on standard glass slides (25 
X 76 mm) using either the Multigrid robotic spotter (GeneMachines'^'^, Genomic 
Instrumentation Services Inc., Menlo Park, CA, U.S.A.) or the BioChip Arrayer™ 
(Packard Instrument Company, Meriden, CT, U.S.A.). The DNAs are spotted in a 
logical order with respect to the ecotype from which the fragments were isolated (upper 
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and lower panel) as shown in Figure 7. In addition, a set of DNAs from monomoiphic 
control fragments was spotted next to the ESP fragment DNAs (right panel in Figure 
7). 

5 Micro-arrays of oligonucleotides. Based on the nucleotide sequences of 

the ESP fragments, oligonucleotides can be designed that can serve as hybridization 
probes to specifically detect each amplified sample fragment. The oligonucleotide probe 
should preferably match with a sequence that is located to one side of the ESP, 
opposite the side where the sequence targeted by the labeled primer is located. In this 
10 way the background is minimized because the linear amplification products generated 

by the labeled primer following digestion with the probing enzyme are not detected. 
The ESP fragment specific oligonucleotides are spotted in a micro-array format in 
exacfly the same way as the amplified ESP fragments. 

IS Step 4. Micrcvarray-h ased detection of ESPs. 

Preparation of the sample DNAs. For each ecotype, sample DNA is 
piq^ared in two different ways. Genomic DNA, digested with the sampling restriction 
enzymes EcoRI and Bfal, was amplified either as such or after cleavage with the 
probing enzyme Msel. The amplification reactions are performed with a fluorescenfly 

20 labeled EcoRI primer and an unlabeled Bfcil primer, both without selective nucleotides. 

The EcoRI primer is labeled by incoii>oration of Cy3(green)- and Cy5(red)-amidites 
during primer synthesis (Amersham Pharmacia Biotech, Uppsala, Sweden). For both 
Columbia and Landsberg, the cleaved sample was amplified with a Cy3-primer while 
the uncleaved fiagments were amplified with a Cy5-labeled EcoRI primer. In addition, 

25 the Landsberg digested material was also amplified with a Cy5-labeled EcoRI PGR 
primer. Three different hybridization solutions are then prepared by mixing equal 
amounts (i.e. equal volumes) of the Cy3- and Cy5-labeled amplification reactions: one 
from the Columbia cleaved and uncleaved samples, a second from the Landsberg 
cleaved and uncleaved samples, and a thiid by mixing the differentially labeled cleaved 

30 samples of both ecotypes. 

In case arrays of PCR products, rather than oligonucleotides, arc used 
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as probes (refer to step 3), the co-amplification of the EcoRI-Bfal sample fragments is 
preferably accomplished with a pair of adaptors that differs from those attached to the 
arrayed probes. The alternative EcoRI and Bfal adaptors have the following structure: 

5 • -GAGCATCTGACGCATCC (SEQ ID NO: 26) 

GTAGACTGCGTAGGTTAA- 5 • ( SEQ ID NO : 2 7) 

5 • - CTGCTACTCAGGACTG (SEQ ID NO: 13) 

ATGAGTCCTGACAT-5 • (SEQ ID NO: 14) 
The cognate non-selective EcoRI and Bfal primers have the following 

sequences: 

5 • - CTGACGCATCCAATTC (SEQ ID NO : 28) 
5 • - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

Micro-array hybridization. Each of the hybridization solutions is allowed 
to hybridize to the arrayed probes using protocols well known in the art. The 
experimental conditions depend primarily on the nature of the probes, PCR-amplified 
fragments versus oligonucleotides. Both types of experiments are amply described in 
Uterature: Wodicka, L. etal., Nature BiotechnoL 15: 1359-1367 (1997); Lockhart, D. 
J. etal. Nature BiotechnoL 14: 1675-1680 (1996); DeRisi, J. L. etal, Science 278: 
680-686 (1997); Shalon, D. etal. Genome Res. 6: 639-645 (1996); Pi^tu, G. etal. 
Genome Res. 6: 492-503 (1196); Chee, M. etal. Science 274: 610-614 (1996); Wang 
D.G. etal.. Science 2S0: 1077-1082 (1998); WinzelerE. A. etaL, Science 281: 1194- 
1197 (1998), all of which are incorporated herein by reference. 

A laser scanning system (ScanArray 3000; General Scanning Inc., 
Watertown, MA, U.S.A.) is used to detect the two-color fluorescence hybridization 
signals from the micro-arrays at a resolution of 10 micron per pixel. A separate scan 
is carried out for each of the two fluorophores used. Scanning parameters and laser 
power settings are adjusted to nomialize the signal in the two channels (channel- 1/Cy 3; 
channel-2/Cy5). The obtained digital images were analyzed using the ImaGene™ image 
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analysis software (BioDiscovery Inc., Los Angeles, CA, U.S.A.). The extracted 
quantitative data are transferred to a spreadsheet for further analysis. 

The present hybridization experiment is essentially set up as a 
confirmation of the gel-electrophoretic data (refer to step 1), and has, therefore, a 
5 predictable outcome. In addition, a number of control probes are included on the 

biochip that detect monomoiphic EcoRI-Bfal Arabidopsis fragments (i.e., fragments 
on which a site for the probing enzyme is either present or absent in both ecotypes). 
The results from these control probes allow correction for background and optical 
cross-talk between the two channels, as well as calibration of the red and green 

10 hybridization signals. It is anticipated that the vast majority of the processed data are 

unambiguous with respect to the allelic state of a sample fragment and in agreement 
with the gel-electrophoretic analysis. Figure 7 shows a false-color representation of the 
idealized results of the present experiment using a fictitious array of probes. It cannot 
be excluded that certain hybridization results are not in agreement with the gel- 

15 electrophoretic assay and/or that certain probes do not allow unambiguous 

determination of the allelic state of the cognate sample fragment. Such probes should 
be excluded from the micro-arrays that are used to genotype experimental Arabidopsis 
samples, other than the Columbia and Landsberg controls used in the present 
illustrative example. 

20 In routine genotyping experiments, either one of the hybridization 

schemes outlined above can be used. Determination of the allelic state can be done by 
comparing the hybridization signals obtained with and without cleavage of the starting 
DNA with the probe reagent. Alternatively, allele-calling could be based on a 
comparison of the signals obtained with the test-sample and an appropriate control (e.g. 

25 Columbia or Landsberg DNA), both cleaved with the probe endonuclease reagent. The 

samples that need to be compared can, in principle, be hybridized separately but a 
preferred method consists of hybridizing a mixture of differentially labeled samples to 
the same array. 



30 
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Elxample 2 
Genetic Analysis in Corn 

In this example, the utility of the method of the invention for maricer 
5 assisted selection applications in plant and animal breeding is illustrated. Com has been 

chosen because it is a typical rqjrcsentative of crop species having a complex genome. 
The laige size of the genome (2,400 Mb), the frequent occurrence of repetitive DNA 
sequences and the high degree of genetic variation, all constitute technical challenges. 
In this example, an approach based on the generation of a set of genomic fragments 

10 carrying ESPs ftom two well-known inbred lines of com, B73 and Mol7 from which 

many of the com elite lines are derived is used. Another reason for choosing these 
lines is that a well-studied recombinant inbred population derived from these lines is 
available. This population can be used to map the set of ESPs. The genetic map of ESP 
markers wiU prove to be an effective tool for genetic selection in com breeding. It is 

15 evident, however, that a broader survey of the com germplasm with a total of 10 to 20 

lines will give a large number of additional ESPs (possibly 2 or 3 times as many) and 
will eventually result in a higher-resolution genetic map. 

The ESP-haiboring fragments could very well be identified by the gel- 
electrophoretic approach described for Arabidopsis (Example 1). However, an 

20 alternative strategy may be used given that the com germplasm, like many crop 

species, exhibits a high degree of genetic variation. Indeed, based on previous studies, 
the average nucleotide sequence variation in the com germplasm is estimated to be in 
the order of 1 difference in 15 to 30 nucleotides. This corresponds to a frequency in 
ESPs in the recognition sites of tetracutter restriction enzymes of 1 in 4. At this 

25 frequency it becomes feasible to directly examine arrays of random B73/Mol7- 

fragments for the presence of ESPs using the present RAA method without prior 
screening or selection. The strategy also lends itself readily to screening with several 
different probing enzymes. 

In the present example, two different approaches for assaying ESPs are 

30 used. The first method (format-I RAA) is similar to the one described in Example 1, 

and detects ESPs in fragments sampled with a pair of restriction enzymes. In the 
second method (format-m RAA) individual ESPs are selectively amplified from the 
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sampled fragments with dedicated primer sets. The principal advantage of the latter 
approach is that ESPs detected with several different probing enzymes can be assayed 
simultaneously, and that multq)lex amplification of ESP-specific PGR products is made 
considerably more robust. 

In essence the procedure described in this example comprises the 
following steps: 

8. Identification of a set of candidate ESP fragments from the 
inbred lines B73 and Mo 17 

9. Development of a com ESP micro-array 

10. Genetic mapping of a B73/Mol7 recombinant inbred population 
and of segregating populations 

Step 1 Identification of ca ndidate ESP fragments 
Cloning of a set of sample fragments. To clone a set of random 
fragments from the inbred lines B73 and Mol7, the enzyme combination PstI and Bfal 
is used. The hexanucleotide-recognizing enzyme PstI was chosen because of the large 
size of the com genome. It is estimated that this enzyme has around 30,000 sites in the 
com genome. The second tetracutter-enzyme, Bfal, is expected to cleave in the 
majority of the cases on both sides of the PstI sites. The double digestion will therefore 
generate about 60,000 sample fragments with an average size of 400-500 base pairs. 

Following double digestion of the genomic DNA, PstI- and Bfal- 
adaptors were ligated to the fragment ends and the material amplified with non- 
selective PstI and Bfal primers. The stmctures of the PstI- and Bfal-adaptors are based 
on those described by Vos P. et al. Nucleic Acids Res. 23: 4407-4414 (1995): 

5 • -CTCGTAGACTGCGTACATGCA (SEQ ID NO: 7) 
3 • - CATCTGACGCATGT ( SEQ ID NO : 8 ) 

5' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3 • -TACTCAGGACTCAT (SEQ ID NO: 4) 

Hie corresponding PstI and Bfal non-selective primers have the following sequences: 
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5» -GACTGCGTACATGCAG (SEQ ID NO: 9) 
5' -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 

5 

i The amplification step enriches the Pstl-Bfal fragments over the large 

excess of Bfal-Bfal fragments. After amplification the fragments are fractionated on 
an agarose gel to eliminate the fragments smaller than 100 base pair, and cloned in an 
appropriate vector (e.g. TA cloning system; Invitrogen, Carlsbad, CA, U.S.A.). 

10 Preparation of spotted micro-arrays with the cloned sample DNA 

fragments. The insert DNAs, from the two libraries of cloned Pstl-Bfal sample 
fragments (obtained from the B73 and Mo 17 inbred lines), are amplified from the 
clones using the non-selective PstI and Bfal primers. Following purification and 
concentration, the amplicons are arrayed as described in Example 1 . A total of 20,000 

15 (i.e. 10,000 from each library) candidate probe DNAs are spotted. 

Micro-array hybridization and selection of candidate ESP-fragmerus, 
From genomic DNA of the inbred lines B73 and Mol7 four different sets of Pstl/Bfal- 
digested amplified DNA are prqiared. An alternative pair of adaptors and non-selective 
amplification primers are used for this: 

20 

5 • -GAGCATCTGACGCATGTTGCA (SEQ ID NO: 11) 
3 ' - GTAGACTGCGTACA (SEQ ID NO: 12) 

5 ' - CTGCTACTCAGGACTG (SEQ ID NO: 13) 
25 3 • -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5 ' - CTGACGCATGTTGCAG (SEQ ID NO: 15) 

' 5 ' - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

" 30 

The sample fragments are amplified either as such or after digestion with 
one of three alternative probing enzymes, Msel, Tsp509I and Alul. As probing 
enzymes many different tetracutter or pentacutter enzymes can be used. Because plant 
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DNA has a high AT content, the preferred enzymes are those that have an AT bias in 
their recognition sequence. Alternatively, mixtures of two or more tetracutter or 
pentacutter enzymes can be used. 

For each of the B73 samples, a Cy3(green)-labeled PstI primer is used, 
whereas the Mol7-derived fragments are amplified with a Cy5(red)-labeled PstI primer 
(refer to Example 1). Different hybridization solutions are then prepared by mixing 
equal amounts of the uncleaved, Msel-cleaved, Tsp509I-cleaved, and Alul-cleaved 
samples of both inbred lines. Each of the 4 mixes is allowed to hybridize to the micro- 
arrays. Analysis of the scanned images involved normalization using the multitude of 
probes on the arrays that detect monomorphic fragments. Figure 8 shows a false-color 
representation of the idealized results of the present experiment using a fictitious array 
of probes. 

Analysis reveals that candidate ESP fragments are readily identified by 
scoring the probes that hybridize with only one of the two inbred line sample DNAs 
after cleavage with the probe enzyme (Figure 8). The quantitative analysis allows us 
the use of an unambiguous cut-off threshold of 10-fold difference in the normalized 
signal intensities for scoring ESPs. It should be pointed out that the assay identifies 
both bona fide ESPs and polymorphisms in the sampling enzyme sites. Most of the 
latter polymoiphisms result in a marked hybridization difference with the sample DNAs 
not cleaved with the probe enzyme (see Figure 8). Analysis of 180 probes reveals that 
roughly 6% of the sample fragments carry ESPs for Msel, Tsp509I, or Alul, in 
accordance with the expected ESP mutation frequency. The analysis of 20,0(X) cloned 
probe fragments is thus expected to yield a total of 1 ,200 fragments carrying ESPs for 
the three probe enzymes tested. By using additional tetracutter and pentacutter 
enzymes (see Table I), the fraction of ESP carrying fragments may be as high as 25 % , 
amounting to 5,000 ESPs. 

Of all probes that exhibit a differential hybridization with the cleaved 
sample DNAs, only those in which the recognition site for the probing enzyme is 
present were retained for development of a com micro-array. Sequence determination 
of these probe-fragments reveals the position of the recognition site for the probe 
enzyme. Thus, we retained only those probes that failed to give a signal with the 
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cleaved sample DNA from the same inbred line from which they were isolated. Such 
probes exhibit the hybridization pattern shown in the Table here below and are marked 
with an arrow in Figure 8. 

B73/Mol7 (Cy3/Cy5) normalized hybridization signal 
Undigested MseI/Tsp509l/AluI-digested 
B73-probes "1 < 0.1 

Mol7-probes "1 > 10 



Step 2. Development of a com ESP micrr>-array 
Sequencing of the candidate ESPs and design of marker specific primers. 
Clones corresponding to the probes that yield the desired hybridization pattern (Figure 
8) are sequenced. The majority of the insert DNAs derived from these clones contain 
a single recognition site for the probing enzyme. For each unique candidate ESP, two 
specific PGR primers, flanking the restriction site, are designed. 

In addition, the sequence of a limited set of probes that yielded invariant 
hybridization signals is also determined. PGR primers targeting these monomoiphic 
sequences arc included as references; they are used to calibrate the hybridization 
signals. 

Validation of the candidate ESPs and fabrication of com micro-arrays. 
The candidate ESPs, identified under step 1, are subjected to a confirmatory 
experiment using the format-m approach. First, four pre-amplification reactions are 
performed with a single primer pair and using the Pstl-Bfal fragments, undigested or 
digested with either one of the three probing enzymes, as template material. These 
amplification reactions reduce the complexity of the DNA under study by more than 
two orders of magnitude while at the same time generating a large enough amount of 
material for the subsequent multiplex marker-specific PGRs. The pre-amplifications 
are then used for the PGR rescue of each of the characterized candidate ESPs using 
dedicated primer couples [refer to Wang, D. G. etal.. Science 2%Q:1011-\Q%2 (1998)]. 
Particular sets of the ESP-specific primers that amplify the same type of ESP (i.e. 
ESPs for one particular probing enzyme) are combined in a single reaction, together 
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with the appropriate pre-amplification material as template. One of the ESP-specific 
primers is either Cy3- or Cy5-labeled; the other remained unlabeled. The Cy3-primers 
are used for the multq^lex amplification of the DNA that had previously been digested 
with a probing enzyme, whereas the Cy5-primers are used with undigested control 
5 DNA. The PGR products from the various multiplex reactions performed on both 

digested and undigested DNA were pooled together to obtain a single hybridization 
mixture per starting DNA. The B73 and Mol7 derived material was analyzed in 
parallel experiments. The set of ESP-specific unlabeled PGR primers served as 
hybridization probes and was arrayed in the same way as amplification products. 

10 Gonditions used are similar to those previously described for hybridization against 

oligonucleotide probes and are readily determined by one of ordinary skill in the art. 

Direct comparison of the normalized Gy3 and Gy5 hybridization signals 
allows determination of the allelic state of the endonuclease target site in B73 versus 
Mol7. Primer pairs that do not allow unambiguous aUele calling or that do not 

15 confirm the candidate ESPs identified with Pstl-Bfal sampling (refer to step 1), are not 

retained for further work. 



Steo 3. Genetic analysis of a B73/Mol7 recombinant i nbred {xyulation and of 

segregating populations 
Genetic analysis of a B73/Mol7 inbred population. A collection of 
recombinant inbred lines derived from a cross between B73 and Mol7 is publicly 
available and provides a most useful set of lines for verifying and mapping the 
collection of ESP markers. The advantage of recombinant inbred lines over segregating 
populations is that each inbred line contains a different set of homozygous chromosome 
segments derived from either parent line. Gonsequently each ESP will be scored as 
either present or absrat. Prq>aiation of the sample DNAs and hybridization against the 
arrayed probes are performed as described under step 2. The experiment will, in the 
first place, allow the testing of selected ESPs in over 100 measurements; the results 
will result in the development of a second generation system that will only detect the 
most consistent ESPs. In addition, the linkage analysis of the segregation data wiH 
allow the construction of a fine genetic map of the markers. Finally, based on the 
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mapping data, an ordered ESP micro-array is developed for com. 

Genetic analysis of segregating populations. While isolated from two 
inbred lines, it is anticipated that the above-mentioned ordered ESP micro-arrays will 
detect sufficient genetic polymorphism in other com lines to be useful for marker 
5 assisted selection. To demonstrate the applicability, one could either chose a 

segregating F2 population or a back-cross population. Sample preparations and 
hybridizations are again performed as described under step 2. In this experiment, the 
ESP markers must be scored quantitatively so as to differentiate between heterozygosity 
and homozygosity. Because only the most consistent markers are retained, a two-fold 

10 difference in signal intensity is easily monitored. The approach used consists of 

normalizing the hybridization signal intensities and then applying a mixture model 
analysis on the normalized data. This statistical approach consists of determining 
whether the relative signal intensities can be grouped into three discrete classes, 
corresponding to respectively homozygous present, heterozygous and homozygous 

15 absent. ESP markers that do not fulfill this criterion should be eliminated from the 

analysis. 



Example 3 

Human Genetic Analysis Using the Format-I RAA 
20 This example illustrates the application of the method of the invention 

for genome-wide genetic analysis in humans. Human is an example of a high 
complexity genome (size '3,000 Mb) combined with a very low level of genetic 
variability. Single nucleotide differences between pairs of allelic sequences from 
different individuals occur approximately once in every 1000 basepairs; in the 
25 population at large, the frequency may be in the order of 1:300. As with Arabidopsis, 

such a low frequency necessitates the use of a selection procedure for the 
isolation/enrichment of the rare ESP-haiboring fragments. In this example a batch- wise 
hybridization is used to accomplish this. 

Based on the known mutation frequencies, it can be estimated that the 
30 ESP finequency for a tetracutter-probing enzyme is in the order of 1 in 125 recognition 
sites. This low level of genetic variation, in combination with the sensitivity of micro- 
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array hybridization, limits the number of ESPs that can be detected in a single assay 
(typically ranging from a few hundred to one thousand, a few thousand at the most). 
These limitations can, to a certain extent, be overcome by choosing probing enzymes 
that recognize tetranucleotide sites containing a CpG dinucleotide. Indeed, it is well 
5 documented that a substantial fraction (> 25%) of the nucleotide substitutions in the 

human genome result from C ^ T transitions in CpG dinucleotides. Such CpG 
dinucleotides represent mutational hotspots in vertebrates because a large fraction of 
the cytosines are methylated and subsequently mutate to thymine by deamination. It is 
estimated that the mutation frequency of methylated cytosines is 6 to 8-fold higher than 

10 average. Hence probing enzymes that cleave CpG-containing recognition sites will 

yield ESPs at correspondingly higher frequencies, estimated at '5%. However, the 
adverse consequence of the high mutation rate is that CpG is relatively rare in 
mammalian DNA, occurring with a frequency of 1 in 100 nucleotides [Wang, D. G. 
€t aL, Science 280:1077-1082 (1998)] instead of 1 in 16. Likewise the frequency of 
- 15 CpG-containing tetranucleotide sites is 1 in *1600 instead of 1 in 256 bases. To 

compensate for this, a probe endonuclease reagent can be used, comprising of two or 
more of the following complementary restriction enzymes: TaqI (TCGA), Mspl 
(CCGG), Maen (ACGT), and HinPI or Hhal (GCGC). It should be noted however that 
cleavage by Maell as well as the isoschizomers HinPI and Hhal is blocked by 

20 methylation of the cytosine residue (C^) within the CpG dinucleotide. These enzymes 

will thus only cleave at a fraction of their sites, namely the non-methylated sites. 
Analysis of the large amount of publicly accessible human genomic DNA sequence 
shows that the cocktail of the 4 enzymes will cleave once in every 400 bp on average. 
The total number of sites in the genome is thus in the order of 7.5 million. Assuming 

25 that the ESP frequency is 5 % , the enzyme cocktail has the potential of detecting 

"375,000 ESPs. In addition to using combinations of restriction endonucleases, one 
may also use reaction conditions that decrease the cleavage specificity. Such a strategy 
has been applied to obtain a restriction endonuclease reagent, designated CGasel, that 
is capable of cleaving DNA at CpG dinucleotides [Mead D. et al, WO 94/21663], 

30 Tliis CGasel restriction endonuclease reagent may be particularly useful for the analysis 

of human polymorphisms using the methods of the present invention. 
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The example described below illustrates the approach in a limited scale 
assay, which characterizes the human ESPs within CpG-containing tetranucleotide 
recognition sites using the sampling enzyme combination Pad - Bfal. The rare cutter 
Pad is estimated to have only about 50,000 cleavage sites in the human genome; the 
frequent cutter Bfal will generate two fragments per Pad site. The enzyme combination 
will, therefore, create a moderately complex set of 100,000 Pacl-Blal target fragments. 
This fragment set captures a sizable number of CpG-containing restriction sites, 
estimated in the order of 40,000. Assuming a 5% ESP frequency, the numt)er of 
detectable ESPs is in the order of 2000. It should be stressed that many different 
sampling enzyme combinations can be used and that thus a substantial fraction of the 
"375,000 ESPs located within NCGN-type restriction sites can be monitored. 

The procedure outlined in this example comprises the following steps: 

(1) Development of a set of candidate Pacl-Bfal ESP fragments 

(2) Genetic analysis of humans using ESP probe fragments 

Step 1 ■ Development of a set of PacT-BfaT probe fragments 
A mixture of sample fragments, derived from various individuals in the 
population, can be divided in three classes with respect to sites for the probing enzyme: 
monomorphic fragments that are devoid of a cleavage site, fragments that are always 
cleaved, and fragments that carry one polymorphic recognition site. Fragments that are 
digested will be referred to as SH- fragments and fragments lacking the site as S- 
ftagments. Polymorphic ESP fragments will thus be the only fragments present in both 
the S+ and S- population of sampling fragments. This forms the basis for their 
selection by batch-wise hybridization: only ESP fragments are capable of armealing 
when mixing the S+ and S- fragment collections. The hybridization-selection can be 
performed in two different, reciprocal ways: either the S+ fragments can be used to 
retrieve the matching S- fragments, or S- fragments are used to collect the 
complementary S+ sampling fragments. In one approach, the selected candidate ESP 
fragments may be isolated by cloning, arrayed, and subsequently validated by testing 
various sample DNAs (e.g. the various sample DNAs used as starting material for the 
hybridization-selection). Candidate ESP probe fragments that appear to detect 
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monomoiphic sample fragments may either be removed from the array or retained as 
control elements on the array. An alternative approach consists of performing the two 
reciprocal hybridization-selections, cloning the selected fragments, and identification 
of ESPs by means of matching S+ and S- fragments. The latter strategy is outlined 
5 below. 

(i) Preparation ofS-h and S- fragments The preferred starting 
material is an equimolar mixture of genomic DNA from a number of representative 
individuals. Such individuals (ranging from 5 to 50) may be chosen from various 
CEPH (Centre d'Etude du Polymoiphisme Humain) pedigrees [Wang, D. G. et aL, 

10 Science 280:1077-1082 (1998)]. Following cleavage of the DNA mixture with die 

PacI/Bfal-combination of sampling enzymes, appropriate oligonucleotide adapters as 
described above arc ligated to the fragment ends. This template DNA is divided in two 
aliquots and treated sq>arately to prqiare respectively the S + and S- fragment mix. To 
prepare the S- fragment mix, the target DNA fragments are cleaved with the probing 

.15 enzyme and then amplified. This will result in a mixture of fragments that do not 

contain sites for the probing enzyme. Furthermore, the S- fragment mixture may be 
prepared by using one biotinylated primer, such that the resulting PCR product can be 
captured onto a solid substrate, such as magnetic beads conjugated with streptavidin. 
S+ fragments are prepared by (1) amplifying the mixture of Pacl-Bfal fragments, (2) 

20 digesting the PCR product with one of the four NCGN-recognizing enzymes, (3) 
ligating appropriate adapters to the ends generated by the probing enzyme (see EP 0 
534 858, incorporated herein by reference), and (4) re-amplification of the resulting 
material using one primer that recognizes the probe enzyme adapter and one primer that 
recognizes one specific sampling enzyme adapter. Similar to the S- fragments, the 

25 amplification reaction can be performed making use of a biotinylated primer that 

matches the probe enzyme adaptor such that the S-f fragment mixture can be 
immobilized. 

Two alternative pairs of Pad- and Bfal-adaptors, as well as 
corresponding non-selective primers are used; e.g. set I is used for the amplification 
30 of the S- fragments and set n for the preparation of S+ fragments: 

Set I 
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5 • - CTCGTAGACTGCGTACCCAT (SEQ ID NO: 17) 
3 • - CATCTGACGCATGGG (SEQ ID NO: 18) 



10 

Setn 



5 • -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3 • -TACTCAGGACTCAT (SEQ ID NO: 4) 

5 ' -GACTGCGTACCCATTA (SEQ ID NO: 19) 

5 • -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 



5 ' -GAGCATCTGACGCATGGGAT (SEQ ID NO: 20) 
3 ' -GTAGACTGCGTACCC (SEQ ID NO: 21) 



15 5' -CTGCTACTCAG6ACTG (SEQ ID NO: 13) 

3 ' -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5 ' - CTGACGCATGGGATTA (SEQ ID NO: 22) 

20 5 • - CTACTCAGGACTGTAG (SEQ ID NO: 16) 



The adaptor ligated to the ends generated by the NCGN-cleaving probing enzyme and 
the corresponding amplification primer have the following structures: 

25 5 • -GTCCTCATCGAGCATG (SEQ ID NO: 23) 

3 • -AGTAGCTCGTACGC (SEQ ID NO: 24) 

5 • - CCTCATCGAGCATGCG (SEQ ID NO: 25) 



30 (ii) Hybridization-selection step(s) The S- fragment mix is 

hybridized to the biotinylated S+ fragments. Following hybridization, the biotinylated 
products are captured onto streptavidin-coated magnetic beads. The beads are 
Tcpeatedly washed to remove all unhybridized fragments and thereafter the hybridized 
S- fragments aie eluted. These are then leampliiied with the Pad and Bfal primers and 

35 the hybridization-selection procedure is repeated at least once. Finally the amplified 
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fragments are cloned in an appropriate vector and a series of around 2,000 inserts are 
sequenced. To select a set of S+ fragments, this procedure is repeated in reverse using 
this time biotinylated S- fragment. Upon comparison of the S -f and S- sequences ESP 
fragments are readily identified as fragments having partially overlapping sequences 
and in which the S- fragment sequence shows a mutated NCGN restriction site at the 
internal boundary of the overlap. In this way, >500 ESPs are readily characterized. 

Step 2. Genetic analysis of humans using ESP probe fragments 
The sequence-verified ESP fragments are spotted on micro-arrays for 
genetic analysis of human sample DNA. For the preparation of this sample DNA, a pair 
of adaptors/primers is used that differs from those attached to the arrayed S- or S+ set of 
ESP fragments. From each individual, an undigested control sample and a probe enzyme 
digested test sample are prepared. These samples are labeled with Cy3 and Cy5, mixed 
and hybridized to the micro-arrays as described before. Alternatively, the hybridization 
mixture may be composed of differentially labeled test DNA and previously genotyped 
control DNA, both digested with the probing endonuclease. In both cases, the Cy3 
(test/digested sample) and Cy5 (control/undigested DNA) signal intensities are normalized 
using a number of monomorphic control probes. The ratio of these normalized Cy3/Cy5 
signals for each of the ESP probes, allows accurate determination of the allelic state of the 
sample at each polymorphic site (homozygous S+/S+, homozygous S-/S-, heterozygous 
S+/S). 

The micro-array hybridization experiment may in the first place be 
performed with the sample DNAs, deriving from a collection of individuals, from which 
the ESP probe Augments were isolated. Such an experiment will, in the first place, confirm 
the polymorphic nature of the selected probe fragments and allow their testing in a 
multitude of measurements. The data will also yield information on the allele frequencies 
among an appreciable number of chromosomes. 
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Example 4 

Human genetic analysis using format-n P AA 

As described for com in Example 2, the format-I ESP assay for human 
genetic analysis may be converted to a format-n or a format-in assay. Based on the 
sequence of the selected and experimentaUy validated ESP fragments, it is indeed 
possible to design a pair of dedicated, i.e. ESP-specific, PGR primers. Such primers 
can be combined in a number of parallel multiplex reactions, which are in turn 
combined to obtain the sample DNA [Wang, D. G. et aL, Science 280: 1077-1082 
(1998)]. This sample DNA is hybridized against a micro-array of spotted S+ ESP 
fragments (see to Example 3). The experiment is set up such that the fluorescently 
labeled ESP-specific primer and the S -h sequences are located on opposite sides of the 
polymoiphic site. Alternatively, the unlabeled ESP-specific amplification primers may 
be arrayed as hybridization probes. The development of a format-II or foimat-in assay 
need not be preceded by the identification of ESP fragments (using one of the methods 
described in the previous examples). In the present example, we describe the 
development of an RAA assay based on the sequence of previously discovered SNPs. 

Close inspection of the known SNPs reveals that a significant percentage 
of them are associated with both the loss and gain of a restriction recognition site, i.e. 
each of two allelic sequences is associated with a different restriction recognition site. 
The single nucleotide substitution may inter-convert recognition sequences that are 
identical except for one nucleotide [e.g. Plel (GACTC) and Hgal (GACGC), Hgal and 
SfaNI (GATGC), SfaNI and Bbvl (GCTGC)]. Alternatively, the alleUc recognition 
sites may be partially overlapping [e.g. MaeH (ACGTg) and NlalH (aCATG); in the 
latter case the inter-conversion depends on the nature of the upstream or downstream 
sequences). Such mutually exclusive restriction site aUelism offers a distinct advantage. 
The RAA technique will normally only detect the allele that is devoid of a recognition 
site for the probing enzyme; therefore, determination of the zygosity requires careful 
calibration of the signal against that observed with undigested control DNA. When each 
allele is associated with the presence/absence of a restriction site, two parallel RAA- 
assays can be perfomied, each involving digestion with one of the alternative enzymes. 
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With such an assay, both alleles can be positively identified and the zygosity is readily 
determined. The two parallel assays are best performed in a two-color mode; one of 
the primers is diiferentially labeled (e.g. with Cy3 and Cy5 as described previously) 
such that the amplification reactions can be mixed and hybridized against a single array 
5 of probes. 

We have systematically explored the SNP database of the Whitehead 
Institute for mutational changes that promote restriction site inter-conversions and have 
calculated their occurrence frequency. Two SNP-associated recognition site inter- 
conversions were found to occur at high frequency: Maell - > Nlain and Hgal - > 

10 SfaM. Inboth cases the mutational changes converting one site into another are C"^T 

(or G->A) transitions occurring in CpG dinucleotides. This finding is entirely consistent 
with the fact that this type of mutation occurs with a 6-8 times higher frequency than other 
nucleotide substitutions. Based on the number of SNPs found in the Whitehead database, 
we estimate the total number of SNPs in the human genome for the enzyme pairs 
.15 Maell/Nlalll and Hgal/SfaNI at respectively 30,000 and 15,000. These numbers are 

presumably somewhat overestimated since both Maell and Hgal are susceptible to CpG 
methylation. Consequently the inter-conversion can only be measured at the non- 
methylated sites. Therefore, in practice, RAA assays designed on the basis of sequence 
data should be validated on a number of test samples. Assays in which no cleavage takes 

20 place at the CpG-containing site in none of the individuals tested, should be eliminated 

from the RAA bi-allelic marker systems. 

The foregoing examples are illustrative of the invention and are not intended to be Umit 
the scope of the invention as set out in the claims. All of the references cited herein are 
incorporated by reference. 



25 
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WE CLAIM: 

1 . A method for detecting an endonuclease site polymorphism (ESP) in 
DNA, the method comprising: 

5 (a) isolating sample DNA; 

(b) derivingia set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) treating the target DNA fragments obtained in step (b) with a 
probe restriction endonuclease reagent; 

10 (d) amplifying the probe restriction endonuclease reagent treated 

target DNA fragments of step(c); 

(e) analyzing the DNA of step (d) to determine which target 

fragments are amplified and/or which target fragments are not amplified; and wherein 

target DNA fragments which are amplified lack a recognition site for the probe 
.1 5 restriction endonuclease reagent and target fragments having a recognition site for the 

probe restriction endonuclease reagent are not amplified. 

2. The method of claim 1 the concomitantly amplifiable taiget DNA 
fragment of step (b) are derived by treatment of the sample DNA with a sampling 

20 restriction endonuclease reagent. 

3. The method of claim 2 wherein the concomitantly amplifiable DNA 
fragments of step (b) are derived from sample DNA by treatment of the sample DNA 
with a first and a second restriction endonuclease reagent. 

25 

4. The method of claim 3 wherein said first restriction endonuclease 
reagent has a recognition sequence of six or more nucleotides and the second restriction 
endonuclease reagents has a recognition sequence of four or fewer nucleotides. 



30 



5. The method of claim 3 or 4 wherein said concomitantly amplifiable 
target DNA fragments are derived by step wise treatment of said sample DNA with the 
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first and the second restriction endonuclease reagents. 
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6. The method of claim 1 further comprising preparing of PGR primers 
which flank the endonuclease site polymoiphism (ESP) for use in amplifying said 

5 concomitantly amplifiable target DNA fragments. 

7. The method of claims 1, 2, 3, and 4 wherein the concomitantly 
amplifiable DNA fragments are modified by ligation of adapters to both termini of said 
fragments, and wherein said adaptors are capable of serving as primers for 

10 amplification. 



8. The method of claim 5 wherein the concomitantly amplifiable DNA 
fragments are modified by ligation of adapters to both termini of said fragments, and 
wherein said adaptors are capable of serving as primers for amplification. 

9, The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising six or more nucleotides. 



10. The method of claim 1 wherein the probe restriction endonuclease 
20 reagent of step (c) has a recognition sequence comprising four or more nucleotides. 

11. The method according to claim 1 wherein the probe restriction 
endonuclease of step (c) has a recognition sequence of two nucleotides. 

25 12. The method according to claim 1 wherein the order of the steps (b) and 

(c) are reversed or carried out simultaneously. 

13. The method according to claim 1 wherein said endonuclease site 
polymorphism is an alteration in a concomitantly amplifiable target fragment giving 
30 rise to a nucleotide sequence that is recognized and cut by the probe restriction 

endonuclease reagent . 
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14. The methcd of claim 1 wherein said site polymorphism is an alteration 
in the nucleotide sequence of a concomitantly amplifiable target fragment which 
eliminates a recognition sequence for said probe restriction endonuclease reagent. 

5 15. The method of claims 1, 2, 3 and 4 wherein said concomitantly 

amplifiable DNA fragments are amplified by a polymerase chain reaction. 

16. The method of claim 5 wherein said concomitantly amplifiable DNA 
fragments are amplified by a polymerase chain reaction. 

10 

17. The method of claim 1 wherein amplified target fragments are 
identified by their ability to hybridize to cognate probe DNA fragments. 



18. A method for obtaining probe DNA fragments for use in detecting 
15 endonuclease site polymorphisms, the method comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fiagments 
from the sample DNA; 

(c) selecting from the target DNA fragments, probe DNA fragments 
20 having an endonuclease site polymorphism (ESPs) for the probe restriction 

endonuclease. 

19. The method of claim 17 wherein said probe DNA fragments are derived 
by digestion of sample DNA with one or more sampling restriction endonuclease 

25 reagents. 

20. The method of claim 18 wherein probe DNA fragments are derived by 
digestion of a pool of sample DNAs obtained from one or more individuals of a 
species. 

30 

21 . The method of claim 1 8 wheiein the probe DNA fragmmts are derived 
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by digestion of a pool of sample DNAs obtained from 10 or more individuals of a 
species. 

22. The method of claim 18 wherein the probe DNA fragments derived by 
5 digestion of a pool of sample DNAs obtained from a pool of 50 or more individuals of 

species. 

23. The method of any one of claims 19-21 wherein said species is selected 
from the group consisting of procaryotic species and eucaryotic species. 

10 

24. A method for obtaining probe DNA fragments for use in detecting 
endonuclease site polymorphisms (ESP) comprising preparing synthetic 
oligonucleotides based on the nucleotide sequence of ampliiiable target DNA fragments 
containing endonuclease site polymorphism(s). 

. 15 

25. A method for producing a microarray of probe DNA the method 
comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
20 from the sample DNA; 

(c) selecting probe DNA fragments having restriction endonuclease site 
polymorphisms (ESPs) from the sample restriction endonuclease treated target DNA 
fragments of step (b); and 

(d) arraying the probe DNA fragments obtained in step (c) on a solid 
25 substrate in a predefined region by attaching the fragments to the substrate. 

26. The method of claim 24 wherein the DNA fragments of step (b) are 
obtained by treating sample DNA with one or more sample restriction endonuclease 
reagents. 

30 



27. 



The method of claim 24 wherein the said probe DNA fragments of step 
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(d) are synthetic oligonucleotides which correspond to the concomitantly amplifiable 
target DNA fragments derivable from said sample DNA and containing an 
endonuclease site polymoiphism (ESP). 

5 28. The method of claim 25, 26 or 27 wherein the solid support is selected 

from a group consisting of a planar solid support, a bead, a sphere and a polyhedron. 

29. The method of claim 25 wherein the microarray comprises at least 2,000 
probe fragments. 

10 

30. The method of claim 26 wherein the microarray comprises at least 2,000 
sythetic ologonucleotides. 

31 . The method of claim 27 wherein the microarray comprises at least 2,000 
. 1 5 probe fragments. 

32. The method of claim 28 wherein the microarray comprises at least 2,000 
probe fragments. 

20 33. The method of claim 25 wherein the microarray comprises at least 

20,000 probe fragments. 

34. The method of claim 26 wherein the microarray comprises at least 
20,000 sythetic ologonucleotides. 

25 

35. The method of claim 27 wherein the microarray comprises at least 
20,000 probe fragments. 



30 



36. The method of claim 28 wherein the nucroarray comprises at least 
20,000 probe fragments. 



CD 
=3 



WO 00/28081 PCT/IB99/01958 

6/8 



4> 










u 






E 








s 




« 
o 


U 


Z 





1^1 




111 II ill HI I 
lii III! mi I 



HI nil I III! 




wo 00/28081 



PCT/IB99/019S8 



7/8 



0^ 
> 

c 



r4 



0^ 

"3 



4> 



.2 
'£ 

E 

U 




o 

a 
o 
o 



o 
is 
c 
o 
o 

+ 



W3 



c« 

PL, 

00 



oooooooooooo 
oooooooooooo 
oooooooooooo 



oooooo# 
oooooo# 



mem 

mmm 
mmm 

mmm 



TS 
S 



B 



oo 



2^ 



s 



too 
c 

CO 



o 



X> 
C 

CO 



SUBSTITUTE SHEET (RULE 26) 



wo 00/28081 




PCT/1B99/01958 




®v • • 




o © 

^ A ^ 










WW 




















• 




O 




• 




• 








• 


•<g> • 


• 



< 
a 

a. 
E 

> 



en J 
03 



z 

"a. 
S 



> 




SUBSTITUTE SHEET (RULE 26) 



• 



wo 00/28081 PCT/IB99/01958 

- 1 - 

SEQUENCE LISTING 



<110> METHEXIS N.V. 

<120> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34158A 

<140> 
<141> 

<150> 60/107,293 

<151> 1998-11-09 

<160> 28 

<170> Patentin Ver . 2.0 

<210> 1 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 1 

ctcgtagact gcgtacc 17 

<210> 2 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Secjuence Listing the 
nucleotide sequence reads in the 5 ' to 3 • 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction, 

<400> 2 

aattggtacg cagtctac 18 

<210> 3 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 3 

gacgatgagt cctgag 



16 



• 
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<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 * 
direction. As presented in the specification the 
sequence reads in the 3* to 5' direction. 

<400> 4 

tactcaggac teat 14 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<22l> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> misc_feature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<400> 5 

gactgcgtac caattcnn 18 

<210> 6 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> mis cofeature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 



<220> 

<221> misc feature 
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<222> (19) 

<223> At position 19 N = A, C, G, or T 



<400> 6 

gatgagtcct gagtagnnn 19 

<210> 7 
<211> 21 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 7 

ctcgtagact gcgtacatgc a 21 

<210> 8 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 



<400> 8 

tgtacgcagt ctac 14 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Secjuence : primer 
<400> 9 

gactgcgtac atgcag 16 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sec[uence : primer 
<400> 10 

gatgagtcct gagtag 16 

<210> 11 
<211> 21 
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<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence :. primer 
<400> 11 

gagcatctga cgcatgttgc a 21 



<210> 12 

<211> 14 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<22 3> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
secfuence reads in the 3' to 5 ' direction. 



<400> 12 

acatgcgtca gatg 14 

<210> 13 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

ctgctactca ggactg 16 



<210> 14 

<211> 14 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5 * direction. 



<400> 14 

tacagtcctg agta 14 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 15 

ctgacgcatg ttgcag 



16 



<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 16 

ctactcagga ctgtag 16 

<210> 17 

<211> 20 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<210> 18 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Seqpaence : primer 



<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 • to 3 * 
direction. As presented in the specification the 
sequence reads in the 3* to 5 ' direction. 

<400> 18 

gggtacgcag tctac 15 

<210> 19 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 17 

ctcgtagact gcgtacccat 



20 



<220> 



<400> 19 

gactgcgtac ccatta 



16 



<210> 20 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 20 

gagcatctga cgcatgggat 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 * to 3 • 
direction. As presented in the specification the 
sequence reads in the 3' to 5' direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 21 

cccatgcgtc agatg 

<210> 22 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 23 

gtcctcatcg agcatg 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Secfuence Listing the 
nucleotide sec[uence reads in the 5 * to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5 • direction. 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 24 
cgcatgctcg atga 



14 



<210> 25 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 25 

cctcatcgag catgcg 16 

<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 • direction. 

<220> 

<223> Description of Artificial Secjuence : primer 
<400> 27 

aattggatgc gtcagatg 18 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 26 

gagcatctga cgcatcc 



17 



<400> 28 

ctgacgcatc caattc 



16 
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RESTRICTED AMPLICON ANALYSIS 



Field of the Invention 

The present invention generally provides a method which facilitates the 
detection of polymorphisms (or mutations). The method is directed to the analysis of 
so-called endonuclease site polymorphisms (ESPs) that result in the gain or loss of a 
restriction endonuclease site. In essence, the ESP is probed with the restriction 
endonuclease reagent prior to amplification, whereby amplification is prevented and 
consequently no signal is observed when cleavage takes place. Unambiguous allele 
calling is performed by comparing the signals obtained with and without cleavage with 
the restriction endonuclease reagent. The method is particularly useful for multiplex 
genotyping, involving the parallel analysis of large numbers of single nucleotide 
polymoiphisms. Preferred methods for detecting the amplicons involve hybridization 
to an arrayed or otherwise identifiable set of cognate probe fragments or 
oligonucleotides . 

Background of the Tnvfty^fjftn 

Molecular approaches for genetic analyses trace the nucleotide sequence 
variation that occurs naturally and randomly in the genomes of all living species. 
Knowledge of the DNA polymorphisms among individuals and between populations 
is important in understanding the complex links between genotypic and phenotypic 
variation. In the absence of complete data about sequence variation, one relies on the 
abiUty to identify 'neaiby' markers that allow to infer the location of certain relevant 
loci or causal sequence variations. The informativeness of the marker depends on the 
magnitude of die linkage disequilibrium. Markers can be used in linkage smdies to 
search for candidate genes and in association studies to identify the functional aUelic 
variation on candidate genes that influence inter-individual variation. 

The vast majority of sequence variation consists of nucleotide 
substitutions, often referred to as single nucleotide polymorphism's (SNPs), resulting 
from mutations that have accumulated during evolution. Most of these nucleotide 
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changes are genetically silent; i.e., they have no measurable biological effect, but 
provide an immense reservoir of variation in DNA structure. Most methods for genetic 
analysis used today rely on the detection of nucleotide sequence variation which can 
be measured by DNA fragment analysis using electrophoretic separation, in which 
5 DNA fragments are fractionated based on size or conformation. Occasionally the 

nucleotide sequence variation will affect either the presence of the DNA fragment or 
its mobility. In this way the primary nucleotide sequence variation will give rise to 
easily detectable DNA fragment polymorphism. Since polymorphic DNA fragments 
are derived from precise locations on the oiganism's genome, they can serve as reliable 

10 genetic markers, or landmarks to identify and locate genes. 

A host of assays to detect DNA polymorphisms, and SNPs in particular, 
have been developed. In some of these assays {e.g. , RFLP [Botstein, D., White, R.L., 
Skolnich, M., Davis, R.W., Am. J. Hum, GeneL 32:314-331 (1998)], CAPS 
[Konieczny, A. Ausubel, J.F., Plant J, 4:403-410 (1993)], dCAPS [Neff, M.M. Neff, 

15 J.D., Chory, J., Pepper, A.E., The Plant Journal 14:387-392 (1998)], PIRA 
[Steinbom, R., Muller, M., Brem, G., Biochim. Biophys. Acta 1397:295-304 (1998)]), 
restriction enzymes are used to detect polymorphic nucleotide sequences that affect 
cleavage. The specificity of restriction enzymes is such that they exhibit a unique 
sensitivity to detect single nucleotide differences occurring in their recognition sites. 

20 The princq)al strengths of restriction enzyme-based genetic analyses are the ease of use 
and the robustness of the assays. In the majority of the cases, the restriction site 
polymorphism is used to detect known, previously identified SNPs and the assay 
consists of any electrophorctical fragment analysis. In one report, the allelic variation 
is detected in a solid-phase EUSA-type setting [Truett, G.E., Walker, J. A., Wilson, 

25 J.B., Redmann, S.M. Jr., TuUey, R.T., Eckardt, G.R., Plastow, G., Mamm. Genome 
9:629-632 (1998)]. 

In WO 91/17269, Lemer et al describe a different method for mapping 
a eukaryotic chromosome by restriction endonuclease mappmg of discrete DNA 
sequences which are complementary to a region of a eukaryotic chromosome. 

^0 Vos et al , Nucl Acids Res, 23:4407-4414 (1995) and EP 0 534 858 

describe a technique for DNA fingeiprintmg called AFLP which is based on the 
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selective polymerase chain reaction based application of restriction fragments of a 
digest of genomic DNA. The application reaction dqiends on the use of primers that 
extend into restriction fragments amplifying only those ftagments in which prior 
extensions match the nucleotide sequence flanking the restriction sites. 

Another method utilizing DNA amplification steps is set out in Williams 
et al, NucL Acids Res. 18:6531-6535 (1990), who describe a DNA fingerprinting 
method termed random amplified polymorphic DNA. 

DNA amplification fingerprinting was described by Caetano Anolles in 
Bio/Technology 9:553-557 (1991). Still another fingerprinting technique caUed 
arbitrarily primed PGR was described in Welsh et al. , NucL Acids Res. 18:7213-7218 
(1990) and Welsh et al , NucL Acids Res. 19:861-866 (1991). 

In WO 94/11530, Cantor et al describe materials and methods for 
position and sequencing by hybridization. Cantor et al. also describe methods for 
creating assays of DNA probes useful in the practice of their method. 

The major shortcoming of the current methods of genetic analysis is the 
limited resolution of the DNA fragment analysis systems, namely the number of DNA 
ftagments that can be separated in a single assay. Generally the fractionation resolution 
ranges from tens to a couple of hundred DNA fragments, at the most. Consequently, 
current genetic analysis methods are limited to a few hundred to a thousand genetic 
markers. While this resolution has been sufficient for analyzing simple genetic traits 
determined by single genes, the analysis of complex traits, which is now being 
undertaken and which involve general or many different genes, will require the analysis 
of a much larger number of genetic markers. It is anticipated that such studies will 
require from a few thousand to possibly several hundred thousand genetic markers. 
Although this could conceivably be accomplished by performing many parallel assays, 
such scaling up will be cost- and labor prx)hibitive. 

A technology that has great potential and which is generating widespread 
interest in the so-caUed micro-array technology (DNA chips). In general, these 
methods are based on measurement of the hybridization of DNA sequences in solution 
to probe sequences that are arrayed on a solid surface. When assaying nucleotide 
polymorphisms, the detector relies on the small differences in hybridization efficiency 
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between two different DNA sequences. In one format, fluorescently labeled sample 
DNA is hybridized to dense arrays of probe nucleic acids, sequence-specific 
hybridization signal is detected by scanning confocal microscopy, and DNA variants 
scored as (predictable) differences in the hybridization pattern. The micro-arrays are 
fabricated either by in-situ light-directed oligonucleotide synthesis [Fodor, S.P.A. et 
al. Science 251: 767 (1991)] or by spotting DNA (off-chip syndiesized 
oligonucleotides or PGR fragments) in an automated procedure. The technology has 
already been demonstrated in the scoring of mutations in mitochondrial DNA [Chee, 
M. et ah. Science 274: 610-614 (1996)], die HTV genome [Lipshutz, R.J. et al. 
Biotechniques 19: 442-447 (1995)], the CFTR cystic fibrosis gene [Cronin, M.T. et 
al. Human Mut.7: 244-255 (1996)], the BRCAl breast cancer gene (Hacia, G.H. et 
al. Nat. Genet. 14: 441-447 (1996)] as well as the entire yeast genome [Winzeler, 
E.A. et al. Science 281:1194 (1998)]. In comparison with most other assays, micro- 
arrays provide a platform for high-throughput, massively paraUel polymorphism 
detection. 

A major disadvantage with the use of microarrays relates to the 
complexity of the hybridization reaction. The detection relies on the very small 
difference in hybridization of DNA sequences differing by only one nucleotide. In 
general, a set of 4 oUgonucleotides, differing only in the identity of the central base, 
is synthesized for each position in the target sequence that has to be interrogated. In 
practice, the number of oligonucleotides needed to correctly genotype one SNP is much 
larger, involving up to 56 different oligonucleotides spanning the variable base [Wang 
et al.. Science 280: 1077-1082 (1998)]. The degree of redundancy is also dramatic if 
one wants to screen the target DNA for all possible mutations; the design then includes 
overlapping oligonucleotide-sets that are offset by one base (a process known as tiling). 
It should be noted that the detection of SNPs by hybridization to arrays depends on the 
use of short oligonucleotide probes. With longer probes such as DNA fragments in the 
size range of 50 to 500 base pairs or larger, it is not possible to distinguish the SNP 
alleles. 
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Summary of the Invention 
The present invention is directed to methods for genotyping 
polymorphisms that result in the gain or loss of an endonuclease cleavage site. Such 
polymorphisms are referred to hereinafter as endonuclease site polymorphisms (ESPs). 
Polymorphisms detectable according to the methods of the present invention include 
single nucleotide polymorphisms (SNPs). The methods of the present invention exploit 
the high discriminatory power of restriction enzymes in a "Restricted Amplicon Assay" 
(RAA) which generally comprises the following steps (see Figure 1): 

(a) isolating sample DNA; 

(b) derivingia set of target DNA fragments, said set of target 
fragments comprising concomitantly amplifiable target DNA fragments from the 
sample DNA; 

(c) treating the target DNA fragments obtained in step (b) a probe 
restriction endonuclease reagent; 

(d) amplifying the amplifiable probe restriction endonuclease reagent 
treated target DNA fragments of step(c); and 

(e) analyzing the DNA of step (d) to determine which target 
fragments are amplified and/or which target fragments are not amplified; and wherein 
amplified target fragments lack a recognition site for the probe restriction endonuclease 
reagent and target fragments having a recognition site for a probe restriction 
endonuclease reagent are not amplified. 

In one aspect, the present invention is directed to RAA-methods, which 
comprise the preparation of concomitantly amplifiable DNA segments by digestion of 
the starting DNA with one or more restriction endonucleases, collectively referred to 
herein as sampling enzymes. This method is herein referred to as format-I RAA and 
is diagrammed in Figure 2. The digested starting DNA may be further modiTied at its 
termini by the addition of adapters, which may serve to prime an amplification reaction 
(see Figure 2). Once sample DNA is obtained, it is treated with a different restriction 
enzyme, the probmg enzyme also referred to as a probe restriction endonuclease 
reagent. A combination of probing and sampling enzymes are chosen such that a 
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substantial fraction of the sample fragments contain a single recognition site for the 
probe endonuclease reagent. In general, probe enzymes used with format-I RAA 
preferably have as a recognition site a nucleotide sequence of less than six nucleotides. 

In another aspect, the present invention is directed to methods for 
format-n RAA for the detection of ESPs, as diagrammed in Figure 3. Format-H RAA 
operates on the same principal as format-I RAA except that the sample amplicons need 
not be DNA fragments, but are rather defined regions of a genome amplifiable with 
specific primer pairs. The amplicons of the format-n RAA are identified on the basis 
of sequence data; e,g. the sequence of ESP-containing restriction fragments identified 
using format-I RAA method or otherwise known SNPs affecting endonuclease cleavage 
sites. In format-II RAA, the test DNA to be analyzed is treated with a probe restriction 
endonuclease reagent, followed by the concomitant amplification of regions of the 
treated DNA (amplicons) using predetermined primers using, for example, the 
polymerase chain reaction as described herein. The analysis of the amplification 
products then proceeds as described in the format-I RAA methods described herein. 
As with format-I RAA, an ESP is genotyped by the presence or absence of a 
recognition site for the probe restriction endonuclease reagent. 

In yet another aspect, the present invention is directed to methods for 
format-m RAA. In essence, format-m RAA consists of a combination of the format-I 
and format-n approaches. One of such combinations is diagrammed in Figure 4. Test 
DNA, digested or not with a probe endonuclease reagent, is sampled with a pair of 
endonuclease reagents and the resulting fragments are co- as described in the format-I 
assay amplified (this step is referred to as the pre-amplification step). These pre- 
amplification mixtures are, in turn, used as templates for a format-n type of PGR 
reaction in which multiple ESP-containing regions are selectively co-amplified using 
specific primer sets. The analysis of the amplification products then proceeds as 
described before. The advantages of format-m RAA are that the stepwise 
amplification facilitates the multiplex PGR of the ESP-specific amplicons and lowers 
the amount of starting material required to interrogate aU the ESPs. 

Arrays, or microarrays of probe DNA wherein the probe DNAs are 
useful in the detection of ESPs are also encompassed by the present invention. 
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Informative probe DNAs are prepared and identified as described in detail below and 
are then attached to a substrate for use in the hybridization reactions with concomitantly 
amplifiable DNA after treatment with a probe restriction endonuclease reagent and 
subsequent amplification. 

Since the method of the invention is based on the detection of a 
particular kind of DNA polymoiphism, which occurs in DNA of any organism, the 
invention will be universally applicable. The methods of the present invention may be 
used to genotype ESPs in a wide variety of organisms from prokaryotic organisms, 
such as bacteria, through complex eukaryotic organisms, viruses, or any organism 
having a genome however simple or complex. The methods may also be used for the 
analysis of extrachromosomal DNA, the DNA found in certain cellular organelles, 
cDNA preparations, or DNA libraries, such as yeast artificial chromosome libraries 
and others. Furthermore, based on the large body of DNA sequence data at hand, it 
is predicted that the genomes of higher organisms carry several hundreds of thousands 
of such DNA polymorphism. Consequenfly , the new method is capable of diagnosing 
the immense number of genetic markers that are needed to unravel complex traits. The 
method is of tremendous value for high throughput genetic analysis in the emerging 
field of pharmacogenomics. Similarly, the method has great potential in the field of 
animal and plant breeding, where high resolution genetic analysis will be needed to 
identify the genes involved in quantitative agronomic traits. 

Various aspects of the present invention are described in more detail 
below {see Detailed Description of the Invention). Variations in each of these aspects 
will be readily appreciated by one of ordinary skill in the art and one with the scope 
of the invention. 

Brief Description of the Drawings 
Figure 1 depicts the general concept of the Restricted Amplicon Assay. 
The vertical arrows indicate the positions of the ESPs. The open circles denote the 
probing enzyme sites that are present, while tiie closed circles denote the mutated sites. 
The first step involves cleavage of the test DNA with the probing endonuclease. The 
second step involves PGR amplification of DNA segments comprising the ESPs. The 
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small horizontal arrows denote the PCR primers flanking the ESPs. When cleavage 
occurs the DNA is cut between the PCR primers, preventing the subsequent 
amplification of the DNA segment comprising those ESPs. Only those DNA segments 
that were not cleaved are amplified. The final step comprises assaying the amplicons. 
5 Figure 2: Diagrammed representation of format-I RAA. The vertical 

arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage sXep. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the adapter ligation step. The open 

10 lines represent the adapters ligated to the ends of the sampled restriction fragments. 

Step 3 represents the probing enzyme cleavage step and the small horizontal arrows 
denote the PCR primers matching the adapter sequences. Step 4 i^resents the PCR 
amplification step in which only the sample fragments that are not cleaved by the 
probing enzyme are amplified. The crossed circles represent the ftagments that are not 

15 amplified. 

Figure 3: Diagrammed representation of format-n RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
probing enzyme cleavage step. The dotted boxes denote the DNA sequences flanking 

20 the ESP sites. Step 2 represents the PCR primer design. The small horizontal arrows 
denote the PCR primers flanking the ESPs Step 3 represents the PCR amplification step 
in which only the sample fragments that are not cleaved by the probing enzyme are 
amplified. The crossed circles represent the fragments that are not amplified. 

Figure 4: Diagrammed representation of format-m RAA. The vertical 

25 arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the pre-amplification step in which 
the sampled fragments are amplified. Step 3 represents the probing enzyme cleavage 

JO step. Step 4 iiq)resents the PCR primer design. The small horizontal arrows denote the 
PCR primers flanking the ESPs. Step 5 represents the PCR amplification step in which 
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only the sample fragments that are not cleaved by the probing enzyme are amplified. 
The crossed circles represent the fragments that are not amplified. 

Figure 5: Graphic representation of target fragments produced by 
cleavage with a hexacutter (full arrows) and a tetracutter (dotted arrows) restriction 
enzyme. Two types of fitagments are produced: type I fragments (dotted lines) carrying 
two tetracutter ends and type n fragments (full lines) carrying one hexacutter end 
(represented by the arrowhead) and one tetracutter end. Upon PCR amplification only 
the type I fragments are amplified. 

Figure 6: EcoRI-Bfal fragments from ecotypes Columbia (C) and 
Landsberg (L) obtained after selective amplification using EcoRI and Bfal AFLP 
primers with respectively 2 and 3 selective nucleotides. The fragment patterns were 
obtained respectively without probing enzyme (no enzyme) and after digestion with the 
Msel probing enzyme. It is noted that most of the larger fragments do not survive after 
Msel digestion, while the majority of the smaller fragments survive the treatment. The 
differences between the ecotypes Columbia (C) and Landsberg (L) observed after Msel 
digestion, marked by the arrows represent ESP carrying fragments. The differences 
found without Msel digestion, marked by the stars represent typical AFLP 
polymorphisms. 

Figure 7: False color hybridization patterns obtained on the Arabidopsis 
micro-arrays. The layout of the Arabidopsis micro-array is as follows: the left panel 
contains the ESP fragment probes derived from Columbia (upper half) and Landsberg 
(lower half), while the right panel contains the control monomorphic probes with 
respectively the negative control fragments (-control) always carrying a probing 
endonuclease site and the positive control fragments (4- control) carrying no probing 
endonuclease site. The upper part of the figure shows the hybridization patterns 
obtained with uncleaved sample DNA, while the lower part of the figure shows the 
hybridization patterns obtained with cleaved sample DNA. The false color code is as 
follows: green represents hybridization with the Cy3-labeled fragments, red represents 
hybridization with the Cy5-labeled fragments, yellow represents hybridization with 
both the Cy3-labeled and the Cy5-labeled fragments, and gray represents no 
hybridization. In this figure of a set of idealized results is presented. The hybridization 
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patterns with the uncleaved sample DNA shows that all probes detect sequences in both 
ecotypes, while the hybridization patterns with the cleaved sample DNA show that the 
ESP fragment probes detect only the sequences in the respective ecotypes from which 
the ESP fragments were isolated. In addition, fragments carrying no site for the 
probing enzyme, detect sequences in both ecotypes, while fragments that always cany 
a site for the probing enzyme do not show a hybridization signal. 

Figure 8: False color hybridization patterns obtained on the com micro- 
arrays. The layout of the com micro-array is as follows: the left panel of probes 
contains random fragments derived from B73, while the right panel contains Mol7- 
fragments. The figure shows four hybridization pattems obtained with respectively 
uncleaved sample DNA, Msel-cleaved, Tsp509I-cleaved and Alul-cleaved cleaved 
sample DNA. The uncleaved sample DNA hybridization pattern shows probes that 
hybridize only to B73 (green), respectively Mol7 (red) fragments, which represent 
polymoiphisms resulting from mutations in the sample enzyme recognition sites. The 
cross in the circle indicates that these probes are eliminated from the analysis. The 
cleaved sample DNA hybridization patterns show that the majority of the probes do not 
give a hybridization signal, indicating that their cognate fragments are cleaved by the 
probing enzyme. Most of the probes giving a signal hybridize to both sample DNAs. 
Those that hybridize to only one of the sample DNAs and that were eliminated 
represent fragments carrying ESPs. The white arrows denote the pix)bes that were 
retained for further analysis. 

Detailed Description of th a Invention 

The term "SNP" means Single Nucleotide Polymorphism, i.e. a 
polymorphism involving the mutation of a single base-pair. 

The term "ESP" means Endonuclease Site Polymorphism, i.e. a 
polymorphism involving two alleles one of which is cleaved by an endonuclease 
reagent while the other exhibits (at least partial) resistance to cleavage by the same 
endonuclease under the same conditions. 

The phrase "(restriction) endonuclease reagent" refers to a reagent that 
consists of one or more enzymes and that cleaves nucleic acids with a certain 
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specificity, i.e. cleavage involves recognition of a particular sequence or set of 
sequences in the target DNA. Endonuclease reagents include but are not limited to the 
common type n restriction enzymes. 

The term "sampling endonuclease(s)" or "sampling enzyme(s)" refers to 
an endonuclease reagent used to derive sets of fragments from the sample DNA. 
The term "probing endonuclease(s)" or "probing enzyme(s)" refers to an endonuclease 
reagent used to probe the allelic state at specific ESP-sites. 

The term "polymorphism" refers to the existence of two or more alleles 
at significant frequencies (>1%) in the population; polymorphism at a single 
chromosomal location constitutes a genetic marker. 

The term "micro-satellite (DNA)" refers to a small array (often less than 
0. 1 kb) of tandem repeats of a very simple sequence, often 1 to 4 base-pair. Variability 
at such a locus is the basis of many genetic markers. 

The term "mutation" means a heritable alteration in the DNA sequence. 

The term "allele" refers to one of several alternative sequence variants 
at a specific locus. 

The term "genotype" is commonly known to mean (i) the genetic 
constitotion of an individual, or (ii) the types of aUele found at a locus in an individual. 

The term "haplotype" refers to the genotype at a series of linked loci on 
a single chromosome. 

The term "sample DNA" or "sample fragments" refers to the set of 
fragments or amplicons derived from the starting DNA by the RAA method. 

The term "zygosity" refers to the homozygous or heterozygous state. 

The term "homozygosity/homozygous" refers to the presence of identical 
alleles at a locus. 

The term "heterozygosity/heterozygous" refers to the presence of 
different alleles at a locus. 

The term "CpG" means a dinucleotide with a cytosine at the 5 '-side and 
a guanine at the 3 '-side. CpG is relatively rare in mammalian DNA because of the 
tendency for the cytosine to be methylated and subsequently mutate to thymine by 
deamination. 
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The term "ecotype" refers to a naturally occurring (plant) variety; race. 
The term "bi-allelic" refers to a polymorphic locus characterized by two 
different alleles. 

The terms "microarray" and "(DNA-)chip" refer to a multitude of 
spatially addressable nucleic acids that serve as probes. The microarray may be used 
in the form of a planar solid support, a bead, a sphere, or a polyhedron. Fabrication 
is done either by in situ combinatorial synthesis of oligonucleotides using 
photolithography, or by robotic spotting of off-chip prepared DNA onto a solid 
surface. 

The methods of the present invention differs conceptually from 
previously described restriction enzyme-dependent assays {supra) that essentially detect 
a fragment length polymorphism. With the present method, starting DNA is restricted 
prior to the amplification reaction and, rather than analyzing the obtained amplification 
product, the presence or absence of amplification is measured to determine the allelic 
state at an ESP site. The treated DNA is preferably amplified by using a polymerase 
chain reaction and is preferably analyzed by means of hybridization against arrays of 
probe DNAs. With the present method, a sample-amplicon, and consequently a 
hybridization signal, is either present or virtually absent. This feature represents a 
major advantage in that it results in a more accurate distinction between variable 
nucleotides than is possible by differential hybridization to allele-specific 
oligonucleotides, and because it greatly facilitates the identification of a set of generally 
useful hybridization conditions. Also, the methods of the invention permit the use of 
both oligonucleotides as well as DNA fragments as probe DNAs. While hybridization 
to arrays allows the simultaneous analysis of a large number of ESPs, it should be clear 
that the amplification of sample DNA, treated with probe restriction endonuclease 
reagent, can be analyzed by any of a variety of methods well known in the art. In these 
methods, an ESP is identified either by the presence of a recognition site for the probe 
restriction endonuclease reagent (which will result in the failure of the sample DNA to 
amplify) or by the loss of a recognition site which will allow amplification of an 
otherwise unamplifiable sample DNA. Alternative methods include, but are not limited 
to, gel-electrophoretic analysis, and the TaqMan assay [Holland P. M. et al., Proc. 
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Natl Acad. Sci. 88: 7276-7280 (1991); with the latter assay detection is done during 
rather than after the amplification reaction]. 

One of the advantages of the method of the invention is the ability to 
calibrate the measured signal against that obtained in a control experiment where 
digestion with the probe restriction endonuclease reagent is omitted. Comparison of the 
respective hybridization signals, following various corrections and normalization 
procedures, is essential for the genotyping of ESPs and the accurate determination of 
the zygosity. The cleaved and uncleaved material can, in principle, be hybridized 
separately but a preferred method consists of hybridizing a mixture of the differentially 
labeled samples to the same array. The present invention is exemplified by several 
specific formats described below. 

(D Format-I RAA: Choice of sampling and probing restriction 
endonuclease reagents. In one of its embodiments the present invention is directed 

to methods for detecting ESPs in a "restricted amplicon assay" (RAA) which comprises 

preparing concomitantly amplLfiable restriction fragments from the starting DNA 

(sample DNA). When generating discrete sets of DNA fragments from genomic DNA, 

the following parameters are important: the average fragment size and the total number 

of fragments. The optimal fragment size for use in the methods (and materials) of the 

present invention is a trade off; the fragments must be sufficiently small for 

amplification with roughly equal efficiency (in general ^500 base pairs) and laige 

enough for having on average one cleavage site for the probing endonuclease reagent. 

In addition to average fragment size, the number of fragments determine the 

complexity of the sample DNA which is critical in view of the limitations of the 

detection sensitivity of micro-array hybridization. In general, the current state of the 

art of microarray hybridization is such that the number of sample fragments should not 

exceed 1CX),(X)0. All of the above-mentioned requisites can be met by the appropriate 

choice of sampling and probing enzymes. A preferred method of the present invention 

to prepare sample DNAs (amplicons) involves the use of two different sampling 

enzymes, a rare cutter endonuclease {e.g., hexacutter) combined with a frequent cutter 

endonuclease (e.g. , tetracutter), as described in EP 0 534 858 Al which describes a 

method called AFLP and which is incorporated herein by reference. As can be seen 
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from Figure 5, the rare cutter enzyme produces large fragments that upon cleavage 
with the frequent cutter enzyme are cut into a number of smaller fragments. This dual 
cleavage generates two types of fragments: the majority having both ends produced by 
the frequent cutter (type I) and a minority of fragments having a rare cutter end and a 
frequent cutter end (type II). After Ugating different adapters to each of the ends and 
using appropriate primers targeted to the ends of the fragments, only the type n 
fragments will be amplified efficiently (see Figure 5). The type I fragments amplify 
with greatly reduced efficiency presumably because the synthetic sequences at the two 
ends constitute an inverted repeat. In general the type n fragments will amplify 
synchronously using a single PGR primer pair that attaches to the ends of the 
fragments. The size limit is typically around 500 base pairs, but can be increased by 
using a different DNA polymerase and other reaction conditions. Thus, as outlined 
above the number of amplifiable fragments will be determined primarily by the choice 
of the rare cutter restriction enzyme. By approximation, this number equals two times 
the number of cleavage sites for the rare cutter. In a preferred embodiment, restriction 
enzymes recognizing 6 nucleotides (hexacutters) or more are used as rare cutters. The 
use of a frequent cutter recognizing 4 nucleotides (tetracutter) as second sampling 
enzyme results in the production of fragments in the optimal size range for co- 
amplification. As probe restriction endonuclease reagents, different tetracutter or 
pentacutter enzymes can be used. The probe restriction endonuclease reagent and the 
frequent cutter sampling enzyme should preferably be chosen such that the ratio of the 
cleavage frequencies of probing over sampling reagent is > 0.5 and <3. This will 
ensure that a substantial fraction of the target fragments are cleaved once by the 
probing enzyme. It is noted that ESPs cannot be genotyped when the fragments are 
cleaved more than once by the probing enzyme. Also, it should be recognized that 
cleavage with the probe restriction endonuclease reagent results in a significant 
reduction (typically 2-4 fold) of the fragment complexity. 

Alternative schemes - different from the one described above - that meet 
the requisites of sample complexity, average fragment size, and occurrence frequency 
of the probe reagent and that will perform equally well, will be readily apparent to one 
of ordinary skill in the art. Alternative schemes may include the use of pairs of 



wo 00/28081 




PCT/IB99/01958 



- 15 - 

frequent cutters, followed by selective amplification (described in EP 0 534 858 Al), 
or the use of type nS restriction enzymes. Type US restriction enzymes are 
characterized by an asymmetric recognition sequence. Most of these enzymes cleave 
at a defined distance to one side of the recognition site and generate single stranded 

5 overhangs that have different sequences. Ligation of adaptor sequences that are 

complementary to only one type of overhang allows the amplification of specific 
subsets of fragments [Kikuya Kato, Nucleic Acids Res. 23: 3685-3690 (1995)]. With 
this strategy the set of fragments obtained with the sampling enzymes can be broken 
up in a defined number of complementary and roughly equally complex subsets. Thus, 

10 with these enzymes it is possible to tune the complexity of the sample. The same 

strategy can be applied by making use of type n enzymes that have an interrupted 
palindromic recognition sequence. 

Type of mutations detected by format-I RAA: In essence the method 

15 of the invention aims to detect mutations affecting the recognition sequences of the site- 

specific probe endonuclease reagents. When the probe enzyme cleaves a sample 
fragment, it is prevented from being amplified and as a consequence the fragment will 
not give a hybridization signal with its cognate probe. Mutations affecting the 
recognition sequence of the probe enzyme will allow amplification of the sample 

20 fragment and will restore the hybridization signal. It is recognized that mutations other 

than those affecting the probe enzyme recognition sites may affect the hybridization 
signals. In particular, mutations affecting the recognition sites of the sampling 
enzymes may also lead to a loss of hybridization signal. Consequenfly, the mere 
detection of a hybridization difference between two samples does not qualify the 

25 difference as being due to an ESP for the probing enzyme. For this one must also 
assay the two samples without probing enzyme cleavage; only those differences that are 
correlated with the cleavage by the probing enzyme qualify as genuine ESPs as defined 
accoxxiing to the present invention. Therefore, a preferred embodiment of the methods 
of the present invention comprise the comparison of the hybridization signals obtained 

30 with and without cleavage of the same starting material by the probe endonuclease 

reagent. Preferably, the digested and undigested sample DNAs are differentially 
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labeled such that equivalent amounts of the material can be mixed and hybridized 
against the same array of probes. It is noted that a further advantage of measuring the 
relative hybridization signals obtained with digested and undigested sample DNAs, is 
that the signal given by the undigested sample DNA serves as an internal control for 
5 correcting variations in amplification and hybridization. 

Identification and design of informative probes to detect ESP- 
harboring fragments. In a preferred embodiment of the present invention sample 
DNAs (amplicons) are hybridized to micro-arrays comprising a set of probe DNAs 

0 which are designed such that each probe will hybridize specifically to one sample DNA 

fragment. For each set of sample DNA fragments a specific set of probes are 
developed that will detect all the ESPs present in the set of sample DNAs. Since in 
most applications only a (minor) fraction of the sample DNAs will actually carry an 
ESP for a particular probing reagent, the set of probe DNAs will preferably consist of 

5 a subset of the sample DNA fragments that are informative in that they hybridize to 

ESP-harboring sample fragments. Preferably, the probes are highly specific for the 
ESP-carrying sample fragments, and do not cross-hybridize with other fragments in the 
sample. This feature is verified by testing the candidate probes in control hybridization 
assays. When developing or designing the probes care should be taken to avoid 

0 hybridization of the labeled primer used to amplify the sample fragments. When the 

probes correspond to a subset of the sample fragments, preferably an alternative set of 
adaptors should be used for their amplification. 

The sections below describe different approaches that may be used to 
assemble sets of unique probe DNAs for fabricating the micro-arrays. Three 

5 alternative approaches are presented, and their choice is determined primarily by the 

degree of nucleotide sequence variation, and hence the ESP frequency, present in the 
species under study. 

(1) Direct screening. When the ESP frequency is high, such that 10% or more of the 
sample fragments carry ESPs, a realistic approach for assembling ESP probes is 
) to array individual sample fragments and test which of them detect an ESP in the 

test material under study. The advantage of this approach is that the same set of 
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fragments can be tested with different probe enzymes. After the screening one 
will retain only those probes that yield a clear-cut difference in hybridization 
between the different test DNAs. This approach is illustrated in Example 2. 

(2) Gel-based screening. With genomic DNA exhibiting intermediate ESP frequencies 
5 (a few %), useful probes can be identified with a gel-based screening approach 

in which the ESPs are identified by comparing the patterns of sample fragments 
obtained from cleaved and uncleaved genomic DNA of various individuals. The 
polymorphic fragments can then be isolated from the gel and cloned or amplified. 
In a second phase, these probe-fragments are verified in a micro-array 
10 hybridization assay. This approach is illustrated in Example 1. 

(3) Batch-wise hybridization selection method. Since both approaches described above 

are inefficient and labor intensive when the ESP frequency is low ( < 1 %), it is 
advantageous to directly select or enrich ESP-carrying fragments. Such an 
approach is described in greater detail in Example 3. 
1 5 The methods of the invention can be used with any type of micro-array: 

spotted ESP-carrying fragments, spotted oligonucleotides or oligonucleotides 
synthesized on solid supports using photolithography [Fodor S. P. A. et aL, Science 
251: 767-773 (1991)]. Oligonucleotide probes can easily be designed based on the 
nucleotide sequences of the ESP-carrying fragments. Also, the methods of the 
20 invention are not limited to the use of planar arrays containing spatially addressable 
probes. A person of skill in the artwill recognize that the methods may alos employ 
a multitude of identifiable solid phase particles (e.g. beads, spheres, and polyhedron), 
each carrying a different probe. Examples of such use are described by Fulton, R. 
[U.S. Patent No. 5,736,330] and Mandecki, W. [ U.S. Patent No. 5,736,332]. 

25 

rm Format-n RAA General outline 

The *format-I RAA' - as described above - can be converted to a 
'format-n assay' when sufficient sequence information of ESP-containing sample 
fragments becomes known. Format-n RAAs can also be designed on the basis of the 
30 known sequences of genomic regions that harbor an ESP and that are available through 
publicly accessible databases. The approach involves the targeted sampling of starting 
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material and consists of the design of dedicated primer pairs that flank the ESP sites. 
Like in format-I RAA, if the site is intact, the starting DNA will be cleaved and no 
PGR product will be generated. Only when the site is mutated will the amplicon be 
generated. In practice, multiple ESP-containing genomic regions are co-amplified after 
cleavage with the probing restriction endonuclease reagent. The ultimate sample DNA 
used in the hybridization reaction is composed of several such multiplex PGR reactions 
pooled together. The feasibility of this approach is evidenced by the recent paper of 
Wang et ah. Science 280: 1077-1082 (1998), incorporated herein by reference. The 
methods for format-H RAA described here are identical to the approach described by 
Wang et al., m the way certain allelic regions are co-amplified, but fundamentally 
different in the way they are diagnosed. The present method takes advantage of the 
clear distinction between having or not having an amplicon depending upon the allelic 
state of the endonuclease target site. The Wang et al. approach in contrast reUes on the 
detection of a hybridization difference as a result of a single nucleotide variation in the 
PGR product. This requires a much more elaborate and redundant hybridization assay. 

Similar to format-I RAA, a preferred method consists of comparing the 
hybridization signals obtained with and without cleavage with the probe restriction 
endonuclease reagent. Preferably, the respective amplification leactions are 
differentiaUy labeled such that the resulting ampUcons can be mixed and hybridized 
against the same array of probes. 

Preferred methods of the format-U RAA are those wherein - of each 
PGR primer pair - that primer that remained unlabeled is used as hybridization probe 
for the corresponding amplicon. This ensures that the excess unincorporated labeled 
primer as weU as the primer extension products obtained with this primer cannot anneal 
to the arrayed probe. Also, the unlabeled PGR primer is complementary to the labeled 
strand of the amplicon. 

Furthermore, the format-H RAA method provides a means to monitor 
mutations in specific genes or loci in addition to scanning the entire genome. Indeed, 
sets of PGR primers that target ESPs in a specific gene or chromosome region can be 
assembled. 
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An RAA assay with positive detection of b oth alleles: It is 
recognized that the 'piesent/absent-scx)re' of the RAA assay cannot (always) distinguish 
between different mutations that can affect cleavage by the probe restriction 
endonuclease reagent. In practice, an ESP should not be assayed when available 
evidence indicates the existence of two or more such mutations at significant 
frequencies in the population. 

In a preferred embodiment the present invention is directed to the 
detection of SNPs that result in the simultaneous loss and gain of a restriction enzyme 
recognition site, i.e. both alleles are associated with a different recognition site. Hgal 
(GACGC) and SfaNI (GATGC) are an example of such reciprocal sites. Use of both 
probing endonuclease reagents in side-by-side experiments excludes alternative alleles 
and results in easy determination of the zygosity (refer to Example 4). 

Multii-alleliic haplotyping;. A single ESP represents a bi-allelic marker, 
which is less informative than a variable micro-satellite, which has multiple alleles. 
It is possible however to compensate for the lower information content by identifying 
several ESPs on a specific chromosomal region. Format-II RAA lends itself readily 
to such an approach and involves the design of a primer pair that encompasses a region 
with a single site for the various selected probe endonuclease reagents. It should be 
recognized as one of the advantages of the present method that multiple ESPs on a 
sample amplicon can be interrogated with a single probe. Furthermore, use of the 
probing enzymes, either separately or in various combinations, in parallel experiments 
allows the construction of the haplotypes for the ESPs under study. In general, the 
statistical associations between traits and specific chromosome regions may be more 
apparent when haplotypes rather than individual markers are used. 

fllDFormat-mRAA 

In a general sense, the format-Hi RAA represents a method of choice for 
very high-density SNP genotyping because it provides a means to overcome the 
intrinsic limitations of both the format-l RAA and the format-II RAA. This is 
essentially achieved by performing a stepwise amplification involving a pre- 
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amplification of sample fragments followed by amplification using multiplexed specific 
primers. The principal advantage of the pre-amplification step is to reduce the 
complexity of the starting DNA, and thus to provide a more favorable starting point 
for performing multiplex PGR reactions. It is noted that this improvement is generally 
applicable to any multiplex PGR reaction, and is not limited to the methods of the 
present invention. Such an approach can also be used when for example SNPs are 
genotyped using the methods described by Wang et al. 

The principal limitation of the format-I RAA lies in the complexity of 
the sample DNA that is hybridized to the microarray. Because the second round of 
amplification in format-in yields only very small amplicons, which are all informative, 
there is no longer a limitation in number of sample fragments that are interrogated. In 
fact the entire genome may be sampled in a series of parallel pre-amplification 
reactions and the amplicons generated in the different multiplex PGR reaction can then 
be pooled together and hybridized to the microarray. 

likewise, the format-m RAA rq)resents preferred methods of format-H 
RAA, especially when the ESPs under study are located on fragments generated by one 
set of sampling endonuclease reagents. Such stepwise amplification comprises the co- 
amplification of sample fragments with a single pair of primers, followed by the 
selective amplification of sets of specific ESP-containing regions (see Figure 5). The 
principal advantage of the format-m RAA over format-H RAA is that the initial 
amplification of the sampling fragments - representing only a fraction of the total 
genome - lowers the amount of starting material required to interrogate a very large 
numbers of ESPs. Also, the approach will facilitate the multiplex amplification of the 
ESP-specific amplicons and, consequently, yield a more robust assay. 

One preferred embodiment of the format-m RAA is its use to genotype 
large numbers of ESPs identified through the use of the format-I RAA. Indeed, 
format-I RAA offers a n^id means to discover large numbers of ESPs in any biological 
species where no large body of sequence information is or will be available. Format-I 
RAA enables one to discover many sets of ESPs for a number of different probing 
enzymes. Using the format-I RAA, each set of ESPs must be assayed on a different 
microarray, because otherwise signals for the same sample fragment will overlap with 
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one another, and thus preclude the proper ESP genotype to be determined. Using the 
format-in RAA, the ESPs identified with different probing enzymes are now assayed 
together on one single microarray, without overlap between the different ESPs. The 
reason is that the overlap in the format-I RAA is caused by the non-informative sample 
5 fragments that are always co-amplified with the ESP fragments. These are eliminated 

from the mixture by the specific PGR amplification. This embodiment is illustrated in 
Examples 2 and 3. 

Another preferred embodiment of the format-Hi RAA is its use to 
genotype large numbers of SNPs identified in high-throughput sequencing of genomic 

10 DNA from different individuals from a given species. Given the generally recognized 

importance of SNPs for the development of high-resolution genotyping methods, 
sequenced SNPs can be expected to accumulate in large numbers in publicly available 
databases in the near future. In particular, in the field of human genetic analysis, SNPs 
will be discovered at a rapidly increasing rate through the massive genome sequencing 

15 programs now in progress. A similar evolution may be anticipated for many other 

species. Hence we decided to perform an in silica analysis of known human SNPs to 
further investigate the potential of the invention. More particularly we have analyzed 
the 3,358 SNP sequences present in the SNP database of the Whitehead Institute [Wang 
etai. Science 280: 1077-1082 (1998)]. We have determined how many of these SNPs 

20 represent an ESP for each of 34 known palindromic and non-palindromic tetra- and 

penta-nucleotide restriction recognition sequences. When extrapolating this number to 
the total number of ESPs in the human genome - assuming a grand total of 3 million 
ESPs - it appears that the number of detectable ESPs per probing restriction enzyme 
is in the range of 25.000 to 150.000. A cumulative analysis reveals that 53% of the 

25 SNPs affect at least one of the 34 restriction sites; a total of 28 % affect the recognition 

site for one of the available tetracutter enzymes. The principal conclusion from this 
analysis is that many of the considered enzymes - used as probing enzymes according 
to the methods of the present invention - will interrogate sufficient SNPs to be able to 
built a high-density map of the human genome. It should also be noted that the use of 

30 multiple probing enzymes is easily accommodated in the targeted assay because the 

sample has to be subdivided anyway over a number of parallel multiplex PGR 
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reactions. This embodiment is illustrated in Example 4. 

It is noted that the fomiat-in RAA may be performed according to 
different procedures. One such procedure is diagrammed in Figure 5, in which the 
test DNA is first sampled using a sampling endonuclease reagent, pre-amplified and 

5 then treated with the probing endonuclease reagent. Variations on this procedure are 

readily recognized by those skilled in the art and include for example, concomitant 
treatment of the test DNA with both the sampling and the probing endonuclease 
reagents and the preparation of sampled DNA fragments using arbitrary PCR priming 
methods [Williams et al. Nucleic Acids Res. 18: 6531-6535 (1990)]. Note that in case 

0 the treatment with the probing endonuclease reagent is performed prior to the pre- 

amplification, the subsequent amplification can be performed with any pair of PCR 
primers directed against the ESP carrying fragments, and thus overcoming the 
limitation of using PCR primers flanking the ESPs. 



15 
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Table I. Analysis of 3,358 SNPs in the Whitehead SNP database. The table lists the 
number of SNPs that represent an ESP for various probing enzymes. The last column 
shows the estimated number of ESPs for each enzyme in the entire human genome 
(refer to text for details). 



Type of probing reagent 


enzvme 


.site 




luuli numDer 
of ESPs 


Tetracutter enzymes 


Tsp509I 


AATT 


122 


109.000 




Maen 


ACGT 


106 


95.000 




Alul 


AGCT 


98 


88.000 




Nlam 


CATG 


158 


141.000 




Mspl 


CCGG 


77 


69.000 




BstUI 


CGCG 


27 


24.000 




Bfal 


CTAG 


67 


60.000 




Sau3A 


GATC 


58 


52.000 




HinPI 


GCGC 


49 


44.000 




Haem 


GGCC 


52 


46.000 




Csp6I 


GTAC 


71 


63.000 




TaqI 


TCGA 


50 


45.000 




Msel 


TTAA 


109 


97.000 


Pentacutter enzymes 


Tsp4CI 


AGNCT 


114 


102.000 




BssKI 


CCNGG 


79 


71.000 




Ddel 


CTNAG 


122 


109.000 




Hinfl 


GANTC 


77 


69.000 




Fnu4HI 


GCNGC 


71 


63.000 




Sau96I 


GGNCC 


64 


57.000 




Maem 


GTNAC 


70 


63.000 


i^uii ijdxiiiuruniiu enzymes 


■ ACIl 




111 

111 


99.000 




MnU 


CCTC 


175 


156.000 




Bbvl 


GCAGC 


65 


58.000 




BsmAI 


GTCTC 


67 


60.000 




BsmFI 


GGGAC 


39 


35.000 




FoW 


GGATG 


66 


59.000 




Heal 


GACGC 


31 


28.000 




Pld 


GAGTC 


39 


35.000 




SfaNI 


GCATC 


51 


46.000 




Alwl 


GGATC 


37 


33.000 




BsrI 


ACTGG 


76 


68.000 




HphI 


GGTGA 


69 


62.000 




Mbon 


GAAGA 


85 


76.000 




TspRI 


CAGTG 


94 


84.000 
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The following illustrative examples were chosen to represent the 
spectrum of genomic complexities and the spectrum of degrees of genetic variation 
5 which are susceptible to analysis using the methods of the present invention: 

Example 1 describes analysis of Arabidopsis (low genomic complexity, 
low genetic variation). 

Example 2 describes genetic analysis of com (high genomic complexity, 
high genetic variation). 

10 Examples 3 and 4 describe genetic analysis in humans (high genomic 

complexity, low genetic variation). 

Numbers given in the examples, and that relate to the occurrence 
frequency of certain restriction sites as well as the average size of the generated 
fragments are in part based on computer simulations using publicly available DNA 

15 sequences. 
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Example 1 
Genetic Analysis in Arahidnp«i« 
In this example, a fragment analysis-based approach is used to generate 
a set of genomic fragments carrying ESPs between the Arabidopsis ecotypes Landsberg 
and Columbia, which are commonly used for genetic studies in the model organism. 
Arabidopsis is an example of a low complexity genome (size "120 Mb), and the two 
ecotypes exhibit a moderate level of genetic variability. Previous studies have revealed 
that the average nucleotide sequence variation between the two ecotypes is in the order 
1 polymorphism in 150 nucleotides. Consequently, the fraction of fragments expected 
to carry an ESP for tetranucleotide recognizing restriction enzymes is expected to be 
in the range of 2.5% (1:40). With such a low frequency, it is helpful to use a selection 
procedure to isolate the rare fragments containing ESPs. 

In essence the procedure described in this example comprises the 
following steps: 

4) Identification of a set of about 200 genomic fragments carrying 
Landsbeig/Columbia ESPs using a gel-electrophoretic approach. 

5) Isolation and characterization of the ESP carrying DNA 
fragments (ESP fragments). 

6) Generation of micro-arrays with the ESP fragments 

7) Confirmation of the ESPs by hybridization. 

Step 1. Identification nf RSP frapiri^ntS 
Sampling enzymes. In the present example EcoRI, a restriction enzyme 
recognizing 6 nucleotides (hexacutter), in combination with Bfal, a restriction enzyme 
recognizing 4 nucleotides (tetracutter), are chosen as sampling enzymes. From the 
random frequency of occurrence of 6 nucleotide sequences (every 4,000 bases), the 
number of sites for hexacutter restriction enzymes in this genome is predicted to be in 
the range of 30,000. In addition to cleavage with a hexacutter, the genomic DNA is 
also cut with a tetracutter so as to generate PCR amplifiable fragments of an average 
size of a few hundred base pairs. Cleavage with the two enzymes gives rise to two 
types of fragments: a majority of fragments resulting from cleavage by the tetracutter 
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enzyme alone and a smaller set of fragments produced by the two enzymes (see Figure 
5). Since the majority of the hexacutter fragments will give rise to two fragments 
having a hexacutter end and a tetracutter end (see Figure 5), this procedure will yield 
a mixture of about 60,000 fragments of this type. Upon amplification using the 
procedure described below only the fragments carrying a tetracutter end and a 
hexacutter end are amplified efficiently (Figure 5). 

Probing enzymes. As probing enzymes many different tetracutter 
enzymes can be used. Ideally, the probing enzyme cleaves most of the sample 
fragments once. Because plant DNA has a high AT content, the preferred tetracutters 
are those that have an AT bias in their recognition sequence. In general, the choice of 
an optimal tetracutter may be determined by particular features of the genome being 
analyzed {e.g. , AT and GC content). In the present example, Msel (recognition site = 
TTAA) was chosen. Tsp509I (recognition site = AATT) is an alternative. It is also 
conceivable to use mixtures of two or more tetracutter enzymes. The EcoRI-Bfal 
sample/target fragmaits that are cleaved and not cleaved with the Msel probing enzyme 
are referred to as cleaved and uncleaved sample/target DNA, respectively. 

Screening for ESP carrying fragments. To detect ESP fragments, subsets 
of uncleaved and cleaved EcoRI-Bfal sample fragments from both ecotypes are 
amplified and the amplicons are compared following gel-electrophoretic fractionation. 
Subsets of the EcoRI-Bfal sample fragments are selectively amplified as described 
[Vos, P. etal.. Nucleic Acids Res. 23: 4407-4414 (1995); Zabeau, M. and Vos, P., 
European Patent Application EP 0534858 (1993) both of which are incorporated herein 
by reference]. Given the complexity of the sample ("50,000 fragments), the selective 
amplifications are performed with EcoRI and Bfal primers having two and three 
selective nucleotides, respectively. This equals 1024 (16 x 64) different selective 
ampliiication reactions. 

The experimental procedure described by Vos P. et al. is followed 
except that the template fragments are incubated at 65*»C during 10 minutes to heat- 
inactivate the T4 ligase enzyme, and, when appUcable, digested with the probing 
enzyme prior to amplification. The structures of the EcoRI and Bfal adaptors are as 
foUows [see, e.g., Vos, P. etal., supra]: 
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5 • - CTCGTAGACTGCGTACC (SEQ ID NO: 1) 

CATCTGACGCATGGTTAA- 5 ' (SEQ ID NO: 2) 

5 ' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

TACTCAGGACTCAT-5 • (SEQ ID NO: 4) 

The EcoRI (radiolabeled by 5 '-phosphorylation) and Bfal primers, 
having two and three selective nucleotides, respectively, have the following sequences 
(where N represents A, C, G, or T): 

5 ' -GACTGCGTACCAATTCNN (SEQ ID NO: 5) 
5 * -GATGAGTCCTGAGTAGNNN (SEQ ID NO: 6) 

Using these reagents, most of the obtainable target fragments contain a 
cleavage site for the probing enzyme and, consequently, will not be amplified when the 
target DNA is cleaved. Most of the fragments that survive the treatment with the 
probing enzyme occur in both ecotypes, and thus carry no ESP. Occasionally fragments 
are found that appear in both ecotypes when the target DNA is not digested and that 
are present in only one of the two ecotypes after digestion. These represent true ESPs 
for the probing enzyme. In addition, fragments will also be found that show typical 
AFLP-polymoiphism between the two ecotypes [Vos, P. et al , Nucleic Acids Res. 23: 
4407-4414 (1995)]. Such polymorphisms are apparent in the fragment patterns 
obtainable with the undigested sample DNAs. A typical result is shown in Figure 6 in 
which the electrophoretic patterns are shown of selectively amplified EcoRI-Bfal 
fragments from the Ecotypes Columbia and Landsberg obtained without and with 
digestion with the Msel probing enzyme. 

Systematic comparison of the patterns of ecotypes Columbia and 
Landsberg before and after digestion, allows the identification of EcoRI-Bfal sample 
amplicons that carry an ESP for the probing enzyme. Using Msel as probing enzyme, 
it is estimated that a total of *2(X) polymoiphic fragments which are present in only one 
of the ecotypes can be identified. 
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Step 2. Isolation and characterization nf P.<; p fragmftnts 
Each of the ESP polymorphic fragments is eluted from the gel-matrix, 
re-amplified and cloned into a suitable plasmid vector (e.g. TA cloning system; 
Invitrogen, Carlsbad, CA, U.S.A.). In each case, two clones are selected for sequence 
determination. Most duplicate clones will yield the same sequence. Duplicate clones 
that gave different sequences were not retained for further work. Since the nucleotide 
sequence of over one third of the Arabidopsis genome is available in the public 
databases {e.g. , Genbank), the chromosomal location of one third of the ESP fragments 
can be determined by matching the fragment sequences to the genomic sequence. 
Furthermore since the genomic sequence is derived from ecotype Columbia, we expect 
a perfect match with the fragment sequences isolated from the same ecotype. The 
sequences of the fragments isolated from ecotype Landsberg will reveal single 
nucleotide differences, amongst which the potential restriction site mutations, affecting 
the Msel recognition sites, should be apparent. 

In addition to the ESP polymorphic fragments, a number of non- 
polymorphic control fragments are processed in the same way. Two types of such 
control monomoiphic fragments are isolated: fragments that do not carry a site for the 
probing enzyme and fragments that carry a site for the probing enzyme in both 
ecotypes. These fragments will serve the purpose of verifying the hybridization on the 
micro-arrays. 

Step 3. Fabrication of ESP tnirro-an-ayc 

Micro-arrays of amplified fragments. The insert DNAs from the 
sequence verified clones arc amplified, e.g. with the use of non-selective EcoRI and 
Bfal primers. PGR products are verified by agarose gel electrophoresis and retained if 
a single product of the correct mobility was present. FoUowing ethanol precipitation, 
theresuspended PGR products are arrayed at high density on standard glass sUdes (25 
X 76 mm) using either the Multigrid robotic spotter (GeneMachines™, Genomic 
Instrumentation Services Inc., Menlo Park, CA, U.S.A.) or the BioChip Arrayer™ 
(Packard Instrument Company, Meriden, CT, U.S.A.). The DNAs are spotted in a 
logical order with respect to the ecotype from which the fragments were isolated (upper 
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and lower panel) as shown in Figure 7, In addition, a set of DNAs from monomoiphic 
control fragments was spotted next to the ESP fragment DNAs (right panel in Figure 
7). 



Micro-arrays of oligonucleotides. Based on the nucleotide sequences of 
the ESP fragments, oligonucleotides can be designed that can serve as hybridization 
probes to specifically detect each amplified sample fragment. The oligonucleotide probe 
should preferably match with a sequence that is located to one side of the ESP, 
opposite the side where the sequence targeted by the labeled primer is located. In this 
way the background is minimized because the linear amplification products generated 
by the labeled primer following digestion with the probing enzyme are not detected. 
The ESP fragment specific oligonucleotides are spotted in a micro-array format in 
exactly the same way as the amplified ESP fragments. 



15 Step 4. Micro-arrav-hased detectio n of ESPs. 

Preparation of the sample DNAs, For each ecotype, sample DNA is 
prepared in two different ways. Genomic DNA, digested with the sampling restriction 
enzymes EcoRI and Bfal, was amplified either as such or after cleavage with the 
probing enzyme Msel. The amplification reactions are performed with a fluorescently 

20 labeled EcoRI primer and an unlabeled Bfal primer, both without selective nucleotides. 

The EcoRI primer is labeled by incorporation of Cy3(green)- and Cy5(red)-amidites 
during primer synthesis (Amersham Pharmacia Biotech, Uppsala, Sweden). For both 
Columbia and Landsberg, the cleaved sample was amplified with a Cy3-primer while 
the uncleaved fragments were amplified with a Cy5-labeled EcoRI primer. In addition, 

25 the Landsberg digested material was also amplified with a Cy5-labeled EcoRI PCR 

primer. Three different hybridization solutions are then prepared by mixing equal 
amounts (i.e. equal volumes) of the Cy3- and Cy5-labeled amplification reactions: one 
from the Columbia cleaved and uncleaved samples, a second from the Landsberg 
cleaved and uncleaved samples, and a third by mixing the differentially labeled cleaved 

30 samples of both ecotypes. 

In case arrays of PCR products, rather than oligonucleotides, are used 
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as probes (refer to stqj 3), the co-amplification of the EcoRI-Bfal sample fragments is 
preferably accompUshed with a pair of adaptors that differs from those attached to the 
arrayed probes. The alternative EcoRI and Bfal adaptors have the following structure: 

5 • -GAGCATCTGACGCATCC (SEQ ID NO: 26) 

GTAGACTGCGTAGGTTAA- 5 ' (SEQ ID NO: 27) 

5 ' - CTGCTACTCAGGACTG (SEQ ID NO: 13) 

ATGAGTCCTGACAT-5 ' (SEQ ID NO: 14) 
The cognate non-selective EcoRI and Bfal primers have the following 

sequences: 

5 • - CTGACGCATCCAATTC (SEQ ID NO: 28) 
5 • - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

Micro-array hybridization. Each of the hybridization solutions is allowed 
to hybridize to the arrayed probes using protocols well known in the art. The 
ejqjerimental conditions depend primarily on the nature of the probes, PCR-amplified 
fragments versus oligonucleotides. Both types of experiments are amply described in 
literature: Wodicka, L. etal. Nature Biotechnol. 15: 1359-1367 (1997); Lockhart, D. 
J. etal.. Nature Biotechnol. 14: 1675-1680 (1996); DeRisi, J. L. etal., Science 278: 
680-686 (1997); Shalon, D. et at.. Genome Res. 6: 639-645 (1996); Pi6tu, G. et al. 
Genome Res. 6: 492-503 (1196); Chee, M. etal. Science 274: 610-614 (1996); Wang 
D.G. etal. Science 2S0: 1077-1082 (1998); Winzeler E. A. etal. Science 281: 1194- 
1197 (1998), all of which are incorporated herein by reference. 

A laser scanning system (ScanArray 3000; General Scanning Inc., 
Watertown, MA, U.S.A.) is used to detect the two-color fluorescence hybridization 
signals from the micro-arrays at a resolution of 10 micron per pixel. A separate scan 
is carried out for each of the two fluorophores used. Scanning parameters and laser 
power settings are adjusted to normalize the signal in the two channels (channel- 1/Cy3; 
channel-2/Cy5), The obtained digital images were analyzed using the ImaGene^"^ image 
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analysis software (BioDiscovery Inc., Los Angeles, CA, U.S.A.). The extracted 
quantitative data are transferred to a spreadsheet for further analysis. 

The present hybridization experiment is essentially set up as a 
confirmation of the gel-electrophoretic data (refer to step 1), and has, therefore, a 
predictable outcome. In addition, a number of control probes are included on the 
biochip that detect monomoiphic EcoRI-Bfal Arabidopsis fragments (i.e., fragments 
on which a site for the probing enzyme is either present or absent in both ecotypes). 
The results from these control probes allow correction for background and optical 
cross-talk between the two channels, as well as calibration of the red and green 
hybridization signals. It is anticipated that the vast majority of the processed data are 
unambiguous with respect to the allelic state of a sample fragment and in agreement 
with the gel-electrophoretic analysis. Figure 7 shows a false-color rq)resentation of the 
idealized results of the present experiment using a fictitious array of probes. It cannot 
be excluded that certain hybridization results are not in agreement with the gel- 
electrophoretic assay and/or that certain probes do not allow unambiguous 
determination of the alleUc state of the cognate sample fragment. Such probes should 
be excluded from the micro-arrays that are used to genotype experimental Arabidopsis 
samples, other than the Columbia and Landsberg controls used in the present 
illustrative example. 

In routine genotyping experiments, either one of the hybridization 
schemes outlined above can be used. Determination of the allelic state can be done by 
comparing the hybridization signals obtained with and without cleavage of the starting 
DNA with the probe reagent. Alternatively, allele-calling could be based on a 
comparison of the signals obtained with the test-sample and an appropriate control (e.g. 
Columbia or Landsbeig DNA), both cleaved with the probe endonuclease reagent. The 
samples that need to be compared can, in principle, be hybridized separately but a 
preferred metfiod consists of hybridizing a mbcture of differentially labeled samples to 
the same array. 
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Example 2 
Genetic Ans ^lvsis in Corn 

In this example, the utility of the method of the invention for marker 
assisted selection applications in plant and animal breeding is illustrated. Com has been 
chosen because it is a typical rqjresentative of crop species having a complex genome. 
The large size of the genome (2,400 Mb), the frequent occurrence of repetitive DNA 
sequences and the high degree of genetic variation, all constitute technical challenges. 
In this example, an approach based on the generation of a set of genomic fragments 
carrymg ESPs from two well-known inbred lines of com, B73 and Mol7 from which 
many of the com elite lines are derived is used. Another reason for choosing these 
lines is that a well-studied recombinant inbred population derived from these lines is 
available. This population can be used to map the set of ESPs. The genetic map of ESP 
markers will prove to be an effective tool for genetic selection in com breeding. It is 
evident, however, that a broader survey of the com germplasm with a total of 10 to 20 
lines will give a large number of additional ESPs (possibly 2 or 3 times as many) and 
will eventually result in a higher-resolution genetic map. 

The ESP-haiboring fragments could very well be identified by the gel- 
electrophoretic approach described for Arabidopsis (Example 1). However, an 
altemative strategy may be used given that the com germplasm, like many crop 
species, exhibits a high degree of genetic variation. Indeed, based on previous studies, 
the average nucleotide sequence variation in the com germplasm is estimated to be in 
the order of 1 difference in 15 to 30 nucleotides. This corresponds to a frequency in 
ESPs in the recognition sites of tetracutter restriction enzymes of 1 in 4. At this 
frequency it becomes feasible to directiy examine arrays of random B73/Mol7- 
fragments for the presence of ESPs using the present RAA method without prior 
screening or selection. The strategy also lends itself readUy to screening with several 
different probing enzymes. 

In the present example, two different approaches for assaying ESPs are 
used. The first metiiod (format-I RAA) is similar to the one described in Example 1, 
and detects ESPs in fragments sampled with a pair of restriction enzymes. In the 
second metiiod (format-m RAA) individual ESPs are selectively amplified from the 
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sampled fragments with dedicated primer sets. The principal advantage of the latter 
approach is that ESPs detected with several different probing enzymes can be assayed 
simultaneously, and that multiplex amplification of ESP-specific PGR products is made 
considerably more robust. 

In essence the procedure described in this example comprises the 
following steps: 

8. Identification of a set of candidate ESP fragments from the 
inbred lines B73 and Mol7 

9. Development of a com ESP micro-array 

10. Genetic mapping of a B73/Mol7 recombinant inbred population 
and of segregating populations 

Step L Identification of candidate RSP fragments 
Cloning of a set of sample fragments. To clone a set of random 
fragments from the inbred lines B73 and Mol7, the enzyme combination Pstt and Bfal 
is used. The hexanucleotide-recognizing enzyme Pstt was chosen because of the large 
size of the com genome. It is estimated that this enzyme has around 30,000 sites in the 
com genome. The second tetracutter-enzyme, Bfal, is expected to cleave in the 
majority of the cases on both sides of the PstI sites. The double digestion will therefore 
generate about 60,000 sample fragments with an average size of 400-500 base pairs. 

Following double digestion of the genomic DNA, PstI- and Bfal- 
adaptors were ligated to the fragment ends and the material amplified with non- 
selective Psfl and Bfal primers. The stmctures of the PstI- and Bfal-adaptors are based 
on those described by Vos P. et al, Nucleic Acids Res. 23: 4407-4414 (1995): 

5 • - CTCGTAGACTGCGTACATGCA (SEQ ID NO: 7) 
3 ' - CATCTGACGCATGT ( SEQ ID NO : 8 ) 

5 • -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3 • -TACTCAGGACTCAT (SEQ ID NO: 4) 



The corresponding Pstt and Bfal non-selective primers have the following sequences: 
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5 » -GACTGCGTACATGCAG (SEQ ID NO : 9) 
5 • -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 

The amplification stq) enriches the Psa-Bfal fragments over the large 
excess of Bfal-Bfal fragments. After amplification the fragments are fractionated on 
an agarose gel to eliminate the fragments smaller than 100 base pair, and cloned in an 
appropriate vector (e.g. TA cloning system; Invitrogen, Carlsbad, CA, U.S.A.). 

Preparation of spotted micro-arrays with the cloned sample DNA 
fragments. The insert DNAs, from the two libraries of cloned Pstl-Bfal sample 
fragments (obtained from the B73 and Mol7 inbred lines), are amplified from the 
clones using the non-selective PstI and Bfal primers. Following purification and 
concentration, the amplicons are arrayed as described in Example 1. A total of 20,000 
(i.e. 10,(K)0 from each library) candidate probe DNAs are spotted. 

Micro-array hybridization and selection of candidate ESP-fragments. 
From genomic DNA of the inbred lines B73 and Mo 17 four different sets of Pstl/Bfal- 
digested amplified DNA are prepared. An alternative pair of adaptors and non-selective 
amplification primers are used for this: 

5 ' -GAGCATCTGACGCATGTTGCA (SEQ ID NO: 11) 
3 • -GTAGACTGCGTACA (SEQ ID NO: 12) 

5 ' - CTGCTACTCT^GGACTG (SEQ ID NO: 13) 

3 • -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5 ' - CTGACGCATGTTGCAG (SEQ ID NO : 15) 

5 ' - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

Hie sample fragments are amplified either as such or after digestion with 
one of three alternative probing enzymes, Msel, Tsp509I and Alul. As probing 
enzymes many different tetracutter or pentacutter enzymes can be used. Because plant 
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DNA has a high AT content, the preferred enzymes are those that have an AT bias in 
their recognition sequence. Alternatively, mixtures of two or more tetracutter or 
pentacutter enzymes can be used. 

For each of the B73 samples, a Cy3(green)-labeled PstI primer is used, 
5 whereas the Mol7-derived fragments are amplified with a Cy5(red)-labeled PstI primer 

(refer to Example 1). Different hybridization solutions are then prepared by mixing 
equal amounts of the uncleaved, Msel-cleaved, Tsp509I-cleaved, and Alul-cleaved 
samples of both inbred lines. Each of the 4 mixes is allowed to hybridize to the micro- 
arrays. Analysis of the scanned images involved normalization using the multitude of 
10 probes on the arrays that detect monomorphic fragments. Figure 8 shows a false-color 
representation of the idealized results of the present experiment using a fictitious array 
of probes. 

Analysis reveals that candidate ESP fragments are readily identified by 
scoring the probes that hybridize with only one of the two inbred line sample DNAs 

15 after cleavage with the probe enzyme (Figure 8). The quantitative analysis allows us 

the use of an unambiguous cut-off threshold of 10-fold difference in the normalized 
signal intensities for scoring ESPs. It should be pointed out that the assay identifies 
both bona fide ESPs and polymoiphisms in the sampling enzyme sites. Most of the 
latter polymorphisms result in a marked hybridization difference with the sample DNAs 

20 not cleaved with the probe enzyme (see Figure 8). Analysis of 180 probes reveals that 
roughly 6% of the sample fragments carry ESPs for Msel, Tsp509I, or Alul, in 
accordance with the expected ESP mutation frequency. The analysis of 20,000 cloned 
probe fragments is thus expected to yield a total of 1 ,200 fragments carrying ESPs for 
the three probe enzymes tested. By using additional tetracutter and pentacutter 

25 enzymes (see Table I), the fraction of ESP carrying fragments may be as high as 25 % , 

amounting to 5,000 ESPs. 

Of all probes that exhibit a differential hybridization with the cleaved 
sample DNAs, only those in which the recognition site for the probing enzyme is 
present were retained for development of a com micro-array. Sequence determination 

30 of these probe-fragments reveals the position of the recognition site for the probe 
enzyme. Thus, we retained only those probes that failed to give a signal with the 
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cleaved sample DNA from the same inbred line from which they were isolated. Such 
probes exhibit the hybridization pattern shown in the Table here below and are marked 
with an arrow in Figure 8. 

B73/Mol7 (Cy3/Cy5) normalized hybridization signal 
Undigested MseI/Tsp509l/AluI-digested 
B73-probes "1 < 0.1 

Mol7-probes "1 > 10 



Sten 2. Development nf a com PiiP micro-array 
Sequencing of the candidate ESPs and design of marker specific primers. 
Clones corresponding to the probes that yield the desired hybridization pattern (Figure 
8) are sequenced. The majority of the insert DNAs derived from these clones contain 
a single recognition site for the probing enzyme. For each unique candidate ESP, two 
specific PGR primers, flanking the restriction site, are designed. 

In addition, the sequence of a limited set of probes that yielded invariant 
hybridization signals is also determined. PGR primers targeting these monomoiphic 
sequences are included as references; they are used to caUbrate the hybridization 
signals. 

Vatidation of the candidate ESPs and fabrication of com micro-arrays. 
The candidate ESPs, identified under step 1, are subjected to a confirmatory 
experiment using the format-HI approach. First, four pre-amplification reactions are 
perfomied with a single primer pair and using the Psd-Bfal fragments, undigested or 
digested with either one of the three probing enzymes, as template material. These 
amplification reactions reduce the complexity of the DNA under study by more than 
two orders of magnitude while at the same time generating a large enough amount of 
material for tiie subsequent multiplex marker-specific PCRs. The pre-amplifications 
are then used for die PGR rescue of each of die characterized candidate ESPs using 
dedicated primer couples [refer to Wang, D. G. eial. Science 280: 1077-10Z2 (1998)]. 
Particular sets of die ESP-specific primers tiiat amplify the same type of ESP (i.e. 
ESPs for one particular probing enzyme) are combined in a single reaction, together 
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with the appropriate pre-amplification material as template. One of the ESP-specific 
primers is either Cy3- or Cy5-labeled; the other remained unlabeled. The Cy3-primers 
arc used for the multiplex amplification of the DNA that had previously been digested 
with a probing enzyme, whereas the Cy 5 -primers are used with undigested control 
5 DNA. The PGR products from the various multiplex reactions performed on both 

digested and undigested DNA were pooled together to obtain a single hybridization 
mixture per starting DNA. The B73 and Mol7 derived material was analyzed in 
parallel experiments. The set of ESP-specific unlabeled PGR primers served as 
hybridization probes and was arrayed in the same way as amplification products. 

10 Conditions used are similar to those previously described for hybridization against 

oligonucleotide probes and are readily determined by one of ordinary skill in the art. 

Direct comparison of the normalized Gy3 and Cy5 hybridization signals 
allows determination of the allelic state of the endonuclease target site in B73 versus 
Mol7. Primer pairs that do not allow unambiguous allele calling or that do not 

15 confirm the candidate ESPs identified with Pstl-Bfal sampling (refer to step 1), are not 

retained for further work. 

Step 3. Genetic analysis of a B73/Mol7 recomb inant inbred population and of 

segregating populations 

20 Genetic analysis of a B73/Mol7 inbred population. A collection of 

recombinant inbred lines derived from a cross between B73 and Mo 17 is publicly 
available and provides a most useful set of lines for verifying and mapping the 
collection of ESP markers. The advantage of recombinant inbred lines over segregating 
populations is that each inbred line contains a different set of homozygous chromosome 

25 segments derived from either parent line. Gonsequently each ESP will be scored as 

either present or absent. Preparation of the sample DNAs and hybridization against the 
arrayed probes are performed as described under step 2. The experiment will, in the 
first place, allow the testing of selected ESPs in over 100 measurements; the results 
will result in the development of a second generation system that will only detect the 

30 most consistent ESPs. In addition, the linkage analysis of the segregation data will 

allow the construction of a fine genetic map of the markers. Finally, based on tiie 
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mapping data, an ordered ESP micro-array is developed for com. 

Genetic analysis of segregating populations. While isolated from two 
inbred lines, it is anticipated that the above-mentioned ordered ESP micro-arrays will 
detect sufficient genetic polymoiphism in other com lines to be useful for marker 
assisted selection. To demonstrate the appUcability, one could either chose a 
segregating F2 population or a back-cross population. Sample preparations and 
hybridizations are again performed as described under step 2. In this experiment, the 
ESP markers must be scored quantitatively so as to differentiate between heterozygosity 
and homozygosity. Because only the most consistent markers are retained, a two-fold 
difference in signal intensity is easily monitored. The approach used consists of 
normalizing the hybridization signal intensities and then applying a mixture model 
analysis on the normalized data. This statistical approach consists of determining 
whether the relative signal intensities can be grouped into three discrete classes, 
corresponding to respectively homozygous present, heterozygous and homozygous 
absent. ESP markers that do not fulfill this criterion should be eliminated from the 
analysis. 



Example 3 

Hwman toeti c Analysis Using the Fo rmat-T w aa 
This example illustrates the application of the method of the invention 
for genome-wide genetic analysis in humans. Human is an example of a high 
complexity genome (size "3,000 Mb) combined with a very low level of genetic 
variability. Single nucleotide differences between pairs of allelic sequences from 
different individuals occur approximately once in every 1000 basepairs; in the 
population at large, the frequency may be in the order of 1:300. As with Arabidopsis, 
such a low frequency necessitates the use of a selection procedure for the 
isolation/enrichment of the rare ESP-haiboiing fragments. In this example a batch-wise 
hybridization is used to accomplish this. 

Based on the known mutation frequencies, it can be estimated that the 
ESP ftequency for a tetracutter-probing enzyme is in the order of 1 in 125 recognition 
sites. This low level of genetic variation, in combination with the sensitivity of micro- 
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array hybridization, limits the number of ESPs that can be detected in a single assay 
(typically ranging from a few hundred to one thousand, a few thousand at the most). 
These limitations can, to a certain extent, be overcome by choosing probing enzymes 
that recognize tetranucleotide sites containing a CpG dinucleotide. Indeed, it is well 
5 documented that a substantial fraction (> 25%) of the nucleotide substitutions in the 

human genome result from C -> T transitions in CpG dinucleotides. Such CpG 
dinucleotides represent mutational hotspots in vertebrates because a large fraction of 
the cytosines are methylated and subsequently mutate to thymine by deamination. It is 
estimated that the mutation frequency of methylated cytosines is 6 to 8-fold higher than 

10 average. Hence probing enzymes that cleave CpG-containing recognition sites wiU 

yield ESPs at correspondingly higher frequencies, estimated at "5 % . However, the 
adverse consequence of the high mutation rate is that CpG is relatively rare in 
mammalian DNA, occurring with a frequency of 1 in 100 nucleotides [Wang, D. G. 
et al. Science 280:1077-1082 (1998)] instead of 1 in 16. Likewise the frequency of 

15 CpG-containing tetranucleotide sites is 1 in "1600 instead of 1 in 256 bases. To 

compensate for this, a probe endonuclease reagent can be used, comprising of two or 
more of the following complementary restriction enzymes: TaqI (TCGA), Mspl 
(CCGG), Maen (ACGT), and HinPI or Hhal (GCGC). It should be noted however that 
cleavage by Maell as well as the isoschizomers HinPI and Hhal is blocked by 

20 methylation of the cytosine residue (C^) within the CpG dinucleotide. These enzymes 

will thus only cleave at a fraction of their sites, namely the non-methylated sites. 
Analysis of the large amount of publicly accessible human genomic DNA sequence 
shows that the cocktail of the 4 enzymes will cleave once in every 400 bp on average. 
The total number of sites in the genome is thus in the order of 7.5 million. Assuming 

25 that the ESP frequency is 5%, the enzyme cocktail has the potential of detecting 

'375,000 ESPs. In addition to using combinations of restriction endonucleases, one 
may also use reaction conditions that decrease the cleavage specificity. Such a strategy 
has been applied to obtain a restriction endonuclease reagent, designated CGasel, that 
is capable of cleaving DNA at CpG dinucleotides [Mead D. et al, WO 94/21663]. 

JO This CGasel restriction endonuclease reagent may be particularly useful for the analysis 

of human polymorphisms using the methods of the present invention. 
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TTie example described below illustrates the approach in a limited scale 
assay, which characterizes the human ESPs within CpG-containing tetranucleotide 
recognition sites using the sampling enzyme combination Pad - Bfal. The rare cutter 
Pad is estimated to have only about 50,000 cleavage sites in the human genome; the 
frequent cutter Bfal will generate two fragments per Pad site. The enzyme combination 
win, therefore, create a moderately complex set of 100,000 Pacl-Blal target fragments. 
This fragment set captures a sizable number of CpG-containing restriction sites, 
estimated in the order of 40,000. Assuming a 5% ESP frequency, the number of 
detectable ESPs is in the order of 2000. It should be stressed that many different 
sampling enzyme combinations can be used and that thus a substantial fraction of the 
"375,000 ESPs located within NCGN-type restriction sites can be monitored. 

The procedure outlined in this example comprises the following steps: 

(1) Development of a set of candidate Pad-Bfal ESP fragments 

(2) Genetic analysis of humans using ESP probe fragments 

Step 1 . Develonment of a set of Pad-BfaT pm h e frapmftnfs 
A mixture of sample fragments, derived from various individuals in the 
population, can be divided in three classes with respect to sites for the probing enzyme: 
monomorphic fragments that are devoid of a cleavage site, fragments that are always 
cleaved, and fragments that carry one polymorphic recognition site. Fragments that are 
digested will be referred to as S+ fragments and fragments lacking the site as S- 
ftagments. Polymoiphic ESP ftagments will thus be the only fragments present in both 
the S+ and S- population of sampling fragments. This forms the basis for their 
selection by batch-wise hybridization: only ESP fragments are capable of annealing 
when mixing the S-h and S- fragment coUections. The hybridization-selection can be 
performed in two different, reciprocal ways: either the S+ fragments can be used to 
retrieve the matching S- fragments, or S- fragments are used to collect the 
complementary S+ sampling fragments. In one approach, the selected candidate ESP 
ftagments may be isolated by cloning, arrayed, and subsequently validated by testing 
various sample DNAs (e.g. the various sample DNAs used as starting material for the 
hybridization-selection). Candidate ESP probe fragments that appear to detect 
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monomoiphic sample fragments may either be removed from the array or retained as 
control elements on the array. An alternative approach consists of performing the two 
reciprocal hybridization-selections, cloning the selected fragments, and identification 
of ESPs by means of matching S-l- and S- fragments. The latter strategy is outlined 
below. 

(i) Preparation of S+ and S- fragments The prefeired starting 
material is an equimolar mixture of genomic DNA from a number of representative 
individuals. Such individuals (ranging from 5 to 50) may be chosen from various 
CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees [Wang, D. G. et al. , 
Science 280:1077-1082 (1998)]. Following cleavage of the DNA mixture with the 
PacI/Bfal-combination of sampling enzymes, appropriate oligonucleotide adapters as 
described above are ligated to the ftagment ends. This template DNA is divided in two 
aUquots and treated sq)arately to prgjare respectively the S+ and S- fragment mix. To 
prepare the S- fragment mix, the target DNA fragments are cleaved with the probing 
enzyme and then amplified. This will result in a mixture of fragments tiiat do not 
contain sites for the probing enzyme. Furthermore, die S- fragment mixture may be 
prepared by using one biotinylated primer, such that Uie resulting PGR product can be 
captured onto a solid substrate, such as magnetic beads conjugated with streptavidin. 
S+ fragments are prepared by (1) amplifying the mixture of Pacl-Bfal fragments, (2) 
digesting tiie PGR product with one of tiie four NGGN-recognizing enzymes, (3) 
ligating appropriate adapters to the ends generated by tiie probing enzyme (see EP 0 
534 858, incoiporated herein by reference), and (4) re-amplification of the resulting 
material using one primer that recognizes the probe enzyme adapter and one primer that 
recognizes one specific sampling enzyme adapter. Similar to the S- fragments, the 
amplification reaction can be performed making use of a biotinylated primer tiiat 
matches the probe enzyme adaptor such that the S-l- fragment mixture can be 
immobilized. 

Two alternative pairs of Pad- and Bfal-adaptors, as well as 
corresponding non-selective primers are used; e.g. set I is used for the amplification 
of the S- fragments and set H for the preparation of S -I- fragments: 
Set I 
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5 » - CTCGTAGACTGCGTACCCAT (SEQ ID NO: 17) 
3 • - CATCTGACGCATGGG (SEQ ID NO: 18) 



5 ' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3 ' -TACTCAGGACTCAT (SEQ ID NO: 4) 

5 • -GACTGCGTACCCATTA (SEQ ID NO: 19) 



5 • -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 



10 



Set n 



5 ' -GAGCATCTGACGCATGGGAT (SEQ ID NO : 20) 
3 • -GTAGACTGCGTACCC (SEQ ID NO: 21) 

15 5 ' - CTGCTACTCAGGACTG (SEQ ID NO: 13) 

3 ' -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5 • - CTGACGCATGGGATTA (SEQ ID NO: 22) 

20 5 ' - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

The adaptor ligated to the ends generated by the NCGN-cleaving probing enzyme and 
the corresponding amplification primer have the following structures: 

25 5 ' -GTCCTCATCGAGCATG (SEQ ID NO: 23) 

3 ' -AGTAGCTCGTACGC (SEQ ID NO: 24) 

5 • - CCTCATCGAGCATGCG ( SEQ ID NO : 25) 



(ii) Hybridization-selection step(s) The S- fragment mix is 
hybridized to the biotinylated S+ fragments. Followmg hybridization, the biotinylated 
products are captured onto streptavidin-coated magnetic beads. The beads are 
repeatedly washed to remove all unhybridized fragments and thereafter the hybridized 
S- fragments are eluted. These are then reamplified with the Pad and Bfal primers and 
the hybridization-selection procedure is repeated at least once. FinaUy the amplified 
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fragments are cloned in an appropriate vector and a series of around 2,000 inserts are 
sequenced. To select a set of S+ fragments, this procedure is repeated in reverse using 
this time biotinylated S- ftagment. Upon comparison of the S + and S- sequences ESP 
fragments are readily identified as fragments having partially overlapping sequences 
and in which the S- fragment sequence shows a mutated NCGN restriction site at the 
internal boundary of the overlap. In this way, ^500 ESPs are readily characterized. 

Step 2. Genetic analysis of h umans using ESP probe fragments 
The sequence-verified ESP fragments are spotted on micro-arrays for 
genetic analysis of human sample DNA. For the preparation of this sample DNA, a pair 
of adaptors/primers is used that differs from those attached to the arrayed S- or S+ set of 
ESP fragments. From each individual, an undigested control sample and a probe enzyme 
digested test sample are prepared. These samples are labeled with Cy3 and Cy5, mixed 
and hybridized to the micro-arrays as described before. Alternatively, the hybridization 
mixture may be composed of differentially labeled test DNA and previously genotyped 
control DNA, both digested with the probing endonuclease. In both cases, the Cy3 
(test/digested sample) and Cy5 (control/undigested DNA) signal intensities are normalized 
using a number of monomorphic control probes. The ratio of these normalized Cy3/Cy5 
signals for each of the ESP probes, allows accurate determination of the allelic state of the 
sample at each polymorphic site (homozygous S-t-/S+, homozygous S-/S-, heterozygous 
S+/S). 

The micro-array hybridization experiment may in the first place be 
performed with the sample DNAs, deriving from a collection of individuals, from which 
the ESP probe fragments were isolated. Such an experiment will, in the first place, confirm 
the polymorphic nature of the selected probe fragments and allow their testing in a 
multitude of measurements. The data will also yield information on the allele frequencies 
among an appreciable number of chromosomes. 
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Example 4 

Human £enetic analysis using fnrma»-n T^^/^ 
As described for com in Example 2, the format-I ESP assay for human 
genetic analysis may be converted to a format-II or a format-m assay. Based on the 
sequence of the selected and experimentally validated ESP fragments, it is indeed 
possible to design a pair of dedicated, i.e. ESP-specific, PGR primers. Such primers 
can be combined in a number of parallel multiplex reactions, which are in turn 
combined to obtain the sample DNA [Wang, D. G. et al. Science 280: 1077-1082 
(1998)]. This sample DNA is hybridized against a micro-array of spotted S+ ESP 
fragments {see to Example 3). The experiment is set up such that the fluorescentiy 
labeled ESP-specific primer and the S -H sequences are located on opposite sides of tiie 
polymorphic site. Alternatively, the unlabeled ESP-specific amplification primers may 
be arrayed as hybridization probes. The development of a format-H or format-m assay 
need not be preceded by the identification of ESP fragments (using one of the methods 
described in the previous examples). In the present example, we describe the 
development of an RAA assay based on the sequence of previously discovered SNPs. 

Close inspection of the known SNPs reveals that a significant percentage 
of tiiran are associated with both the loss and gain of a restriction recognition site, i.e. 
each of two allelic sequences is associated with a different restriction recognition site. 
The single nucleotide substitution may inter-convert recognition sequences that are 
identical except for one nucleotide [e.g. Plel (GACTC) and Hgal (GACGC), Hgal and 
SfaNI (GATGC), SfaNI and Bbvl (GCTGC)]. Alternatively, the allelic recognition 
sites may be partially overlapping [e.g. Maell (ACGTg) and NlalH (aCATG); in the 
latter case the inter-conversion depends on the nature of the upstream or downstream 
sequences). Such muttially exclusive restriction site alleUsm offers a distinct advantage. 
The RAA technique will normally only detect the allele that is devoid of a recognition 
site for the probing enzyme; therefore, determination of the zygosity requires careful 
calibration of the signal against that observed with undigested contrx)l DNA. When each 
allele is associated with the presence/absence of a restriction site, two parallel RAA- 
assays can be performed, each involving digestion with one of tiie alternative enzymes. 
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With such an assay, both alleles can be positively identified and the zygosity is readily 
determined. The two parallel assays are best performed in a two-color mode; one of 
the primers is differentially labeled (e.g. with Cy3 and Cy5 as described previously) 
such that the amplification reactions can be mixed and hybridized against a single array 
5 of probes. 

We have systematically explored the SNP database of the Whitehead 
Institute for mutational changes that promote restriction site inter-conversions and have 
calculated their occurrence frequency. Two SNP-associated recognition site inter- 
conversions were found to occur at high frequency: Maell - > Nlain and Hgal - > 

10 SfaNI. In both cases the mutational changes converting one site into another are C->T 

(or G->A) transitions occurring in CpG dinucleotides. This finding is entirely consistent 
with the fact that this type of mutation occurs with a 6-8 times higher fi-equency than other 
nucleotide substitutions. Based on the number of SNPs found in the Whitehead database, 
we estimate the total number of SNPs in the human genome for the enzyme pairs 

15 Maell/Nlam and Hgal/SfaNI at respectively 30,000 and 15,000. These numbers are 

presumably somewhat overestimated since both Maell and Hgal are susceptible to CpG 
methylation. Consequentiy the inter-conversion can only be measured at the non- 
methylated sites. Therefore, in practice, RAA assays designed on the basis of sequence 
data should be validated on a number of test samples. Assays in which no cleavage takes 

20 place at the CpG-containing site in none of the individuals tested, should be eliminated 

from the RAA bi-allelic marker systems. 

The foregoing examples are illustrative of the invention and are not intended to be limit 
the scope of the invention as set out in the claims. All of the references cited herein are 
incorporated by reference. 

25 
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WE CLAIM: 

1 . A method for detecting an endonuclease site polymorphism (ESP) in 
DNA, the method comprising: 

5 (a) isolating sample DNA; 

(b) derivingia set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) treating the target DNA fragments obtained in step (b) with a 
probe restriction endonuclease reagent; 

10 (d) amplifying the probe restriction endonuclease reagent treated 

target DNA fragments of step(c); 

(e) analyzing the DNA of step (d) to determine which target 

fragments are amplified and/or which target fragments are not amplified; and wherein 

target DNA fragments which are amplified lack a recognition site for the probe 
15 restriction endonuclease reagent and target fragments having a recognition site for the 

probe restriction endonuclease reagent are not amplified. 

2. The method of claim 1 the concomitantly amplifiable target DNA 
fragment of step (b) are derived by treatment of the sample DNA with a sampling 

20 restriction endonuclease reagent. 

3. The method of claim 2 wherein the concomitantly amplifiable DNA 
fragments of step (b) are derived from sample DNA by treatment of the sample DNA 
with a first and a second restriction endonuclease reagent. 

25 

4. The method of claim 3 wherein said first restriction endonuclease 
reagent has a recognition sequence of six or more nucleotides and the second restriction 
endonuclease reagents has a recognition sequence of four or fewer nucleotides. 



30 



5. The method of claim 3 or 4 wherein said concomitantly amplifiable 
target DNA fragments are derived by step wise treatment of said sample DNA with the 
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first and the second restriction endonuclease reagents. 

6. The method of claim 1 further comprising preparing of PGR primers 
which flank the endonuclease site polymorphism (ESP) for use in amplifying said 

5 concomitantly amplifiable target DNA fragments. 

7. The method of claims 1, 2, 3, and 4 wherein the concomitantly 
amplifiable DNA fragments are modified by ligation of adapters to both termini of said 
fragments, and wherein said adaptors are capable of serving as primers for 

10 amplification. 



8. The method of claim 5 wherein the concomitantly amplifiable DNA 
fragments are modified by ligation of adapters to both termini of said fragments, and 
wherein said adaptors are capable of serving as primers for amplification. 

9. The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising six or more nucleotides. 

10. The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising four or more nucleotides. 

11. The method according to claim 1 wherein the probe restriction 
endonuclease of step (c) has a recognition sequence of two nucleotides. 

12. The method according to claim 1 wherein the order of the steps (b) and 
(c) are reversed or carried out simultaneously. 

13. The method according to claim 1 wherein said endonuclease site 
polymorphism is an alteration in a concomitantly amplifiable target fragment giving 
rise to a nucleotide sequence that is recognized and cut by the probe restriction 
endonuclease reagent . 
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14. The method of claim 1 wherein said site polymorphism is an alteration 
in the nucleotide sequence of a concomitantly amplifiable target fragment which 
eliminates a recognition sequence for said probe restriction endonuclease reagent. 

5 15. The method of claims 1, 2, 3 and 4 wherein said concomitantly 

amplifiable DNA fragments are amplified by a polymerase chain reaction. 

16. The method of claim 5 wherein said concomitantly amplifiable DNA 
fragments are amplified by a polymerase chain reaction. 

10 

17. The method of claim 1 wherein amplified target fragments are 
identified by their ability to hybridize to cognate probe DNA fragments. 

18. A method for obtaining probe DNA fragments for use in detecting 
endonuclease site polymorphisms, the method comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) selecting from the target DNA fragments, probe DNA fragments 
having an endonuclease site polymorphism (ESPs) for the probe restriction 
endonuclease. 

19. The method of claim 17 wherein said probe DNA fragments are derived 
by digestion of sample DNA with one or more sampling restriction endonuclease 

25 reagents. 

20. The method of claim 18 wherein probe DNA fragments are derived by 
digestion of a pool of sample DNAs obtained from one or more individuals of a 
species. 

30 

21 . The method of claim 18 wherein the probe DNA fragments are derived 



15 



20 



wo 00/28081 




PCT/IB99/01958 



-49- 

by digestion of a pool of sample DNAs obtained from 10 or more individuals of a 
species, 

22. The method of claim 18 wherein the probe DNA fragments derived by 
digestion of a pool of sample DNAs obtained from a pool of 50 or more individuals of 
species. 

23. The method of any one of claims 19-21 wherein said species is selected 
from the group consisting of procaryotic species and eucaryotic species. 

24. A method for obtaining probe DNA fragments for use in detecting 
endonuclease site polymorphisms (ESP) comprising preparing synthetic 
oligonucleotides based on the nucleotide sequence of amplifiable target DNA fragments 
containing endonuclease site polymorphism (s). 

25. A method for producing a microarray of probe DNA the method 
comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) seleaing probe DNA fi-agments having restriction endonuclease site 
polymorphisms (ESPs) from the sample restriction endonuclease treated target DNA 
fragments of step (b); and 

(d) arraying the probe DNA fi-agments obtained in step (c) on a solid 
substrate in a predefined region by attaching the fragments to the substrate. 

26. The method of claim 24 wherein the DNA fragments of step (b) are 
obtained by treating sample DNA with one or more sample restriction endonuclease 
reagents. 



27. The method of claim 24 wherein the said probe DNA fragments of step 
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(d) are synthetic oligonucleotides which correspond to the concomitantly amplifiable 
target DNA fragments derivable from said sample DNA and containing an 
endonuclease site polymorphism (ESP). 

28. The method of claim 25, 26 or 27 wherein the solid support is selected 
from a group consisting of a planar solid support, a bead, a sphere and a polyhedron. 

29. The method of claim 25 wherein the microanay comprises at least 2,000 
probe fragments. 

30. The method of claim 26 wherein the microanay comprises at least 2,000 
sythetic ologonucleotides. 

31 . The method of claim 27 wherein the microanay comprises at least 2,000 
probe fragments. 

32. The method of claim 28 wherein the microanay comprises at least 2,000 
probe fragments. 

33. The method of claim 25 wherein the microarray comprises at least 
20,000 probe fragments. 

34. The method of claim 26 wherein the microarray comprises at least 
20,000 sythetic ologonucleotides. 

35. The method of claim 27 wherein the microarray comprises at least 
20,000 probe fragments. 



36. The method of claim 28 wherein the microarray comprises at least 
20,000 probe fragments. 
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SEQUENCE LISTING 



<110> METHEXIS N.V. 

<120> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34158A 

<140> 
<14I> 

<150> 60/107,293 
<151> 1998-11-09 

<160> 28 

<170> Patentin Ver. 2.0 

<210> 1 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 1 

ctcgtagact gcgtacc 

<210> 2 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 • to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5 » direction. 

<400> 2 

aattggtacg cagtctac 

<210> 3 
<211> 16 
<212> DNA 

<213> T^tificial Sec[uence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 3 

gacgatgagt cctgag 



16 
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<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 » to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 4 

tactcaggac teat 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> misc_feature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<400> 5 

gactgcgtac caattcnn 

<210> 6 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N « A, C, G, or T 
<220> 

<221> misc__f eature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<220> 

<221> misc feature 
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<222> (19) 

<223> At position 19 N = A, C, G, or T 
<400> 6 

gatgagtcct gagtagnnn 

<210> 7 
<211> 21 
<212> DNA 

<213> Artificial Sec[uence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 7 

ctcgtagact gcgtacatgc a 

<210> 8 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 8 

tgtacgcagt ctac 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 9 

gactgcgtac atgcag 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 10 

gatgagtcct gagtag 

<210> 11 
<211> 21 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 11 

gagcatctga cgcatgttgc a 

<210> 12 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> AS presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 • direction. 

<400> 12 
acatgcgtca gatg 

<210> 13 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

ctgctactca ggactg 

<210> 14 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 * 
direction. As presented in the specification the 
secfuence reads in the 3' to 5 ' direction. 

<400> 14 
tacagtcctg agta 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

ctgacgcatg ttgcag 

<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 16 

ctactcagga ctgtag 

<210> 17 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 17 

ctcgtagact gcgtacccat 

<210> 18 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 * to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5 • direction. 

<400> 18 

gggtacgcag tctac 

<210> 19 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 19 

gactgcgtac ccatta 



<210> 20 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 20 

gagcatctga cgcatgggat 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 • 
direction. As presented in the specification the 
sequence reads in the 3' to 5 • direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 21 

cccatgcgtc agatg 

<210> 22 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 23 

gtcctcatcg agcatg 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 * to 3 » 
direction. As presented in the specification the 
sequence reads in the 3' to 5 » direction. 
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<220> 

<223> Description of Artificial Sequence: primer 

<400> 24 
cgcatgctcg atga 

<210> 25 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 25 

cctcatcgag catgcg 

<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 26 

gagcatctga cgcatcc 

<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 • to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 • direction, 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 27 

aattggatgc gtcagatg 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 28 

ctgacgcatc caattc 



16 



